graphio documentation
Graphio is a Python library for bulk loading data to Neo4j. Graphio collects multiple sets of nodes and relationships and loads them to Neo4j. A common example is parsing a set of Excel files to create a Neo4j prototype. Graphio only loads data, it is not meant for querying Neo4j and returning data.
Graphio works with NodeSet
and RelationshipSet
classes which are groups of nodes
and relationships with similiar properties. Graphio can load these data sets to Neo4j using CREATE
or MERGE
operations.
Graphio uses py2neo in several places. Part of the bulk data loading logic developed for graphio was merged into py2neo. While py2neo is a comprehensive Neo4j library including object-graph mapping, graphio is made to quickly build a Neo4j database from existing data sets.
Version
Install
Use pip to install:
pip install -U graphio
Example
Iterate over a file that contains people and the movies they like and extract nodes and relationships. Contents of example file ‘people.tsv’:
Alice; Matrix,Titanic
Peter; Matrix,Forrest Gump
John; Forrest Gump,Titanic
The goal is to create the follwing data in Neo4j:
(Person)
nodes(Movie)
nodes(Person)-[:LIKES]->(Movie)
relationships
# under the hood py2neo is used to connect to Neo4j
# you always need a py2neo.Graph instance
from py2neo import Graph
graph = Graph()
from graphio import NodeSet, RelationshipSet
# define data sets
people = NodeSet(['Person'], merge_keys=['name'])
movies = NodeSet(['Movie'], merge_keys=['title'])
person_likes_movie = RelationshipSet('LIKES', ['Person'], ['Movie'], ['name'], ['title'])
with open('people.tsv') as my_file:
for line in my_file:
# prepare data from the line
name, titles = line.split(';')
# split up the movies
titles = titles.strip().split(',')
# add one (Person) node per line
people.add_node({'name': name})
# add (Movie) nodes and :LIKES relationships
for title in titles:
movies.add_node({'title': title})
person_likes_movie.add_relationship({'name': name}, {'title': title}, {'source': 'my_file'})
# create the nodes in NodeSet, needs a py2neo.Graph instance
people.create(graph)
movies.create(graph)
person_likes_movie.create(graph)
The code in the example should be easy to understand:
Define the data sets you want to add.
Iterate over a data source, transform the data and add to the data sets.
Store data in Neo4j.
Note
The example does create mulitple nodes with the same properties. You have to take care of uniqueness yourself.
Continue with the Basic Workflow section.