Basic Workflow
NodeSets
With graphio you predefine the NodeSet
and add nodes:
from graphio import NodeSet
people = NodeSet(['Person'], merge_keys=['name'])
people.add_node({'name': 'Peter', 'city': 'Munich'})
The first argument for the NodeSet
is a list of labels used for all nodes in this NodeSet
.
The second optional argument are merge_keys
, a list of properties that confer uniqueness of the nodes
in this NodeSet
. All operations
based on MERGE
queries need unique properties to identify nodes.
When you add a node to the NodeSet you can add arbitrary properties to the node.
Uniqueness of nodes
The uniqueness of the nodes is not checked when adding to the NodeSet. Thus, you can create mulitple nodes with the same ‘name’ property.
Use NodeSet.add_unique()
to check if a node with the same properties exist already:
people = NodeSet(['Person'], merge_keys=['name'])
# first time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1
# second time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1
Warning
This function iterates all nodes when adding a new one and does not scale well. Use only for small nodesets.
Default properties
You can set default properties on the NodeSet
that are added to all nodes when loading data:
people_in_europe = NodeSet(['Person'], merge_keys=['name'],
default_props={'continent': 'Europe'})
RelationshipSets
In a similar manner, RelationshipSet
are predefined and you add relationships:
from graphio import RelationshipSet
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])
person_likes_food.add_relationship(
{'name': 'Peter'}, {'type': 'Pizza'}, {'reason': 'cheese'}
)
The arguments for the RelationshipSet
relationship type
labels of start node
labels of end node
property keys to match start node
property keys to match end node
When you add a relationship to RelationshipSet
all you have to do is to define the matching properties for the
start node and end node. You can also add relationship properties.
Default properties
You can set default properties on the RelationshipSet
that are added to all relationships when loading data:
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'],
default_props={'source': 'survey'})
Create Indexes
Both class:~graphio.NodeSet and RelationshipSet
allow you to create indexes to speed up data loading.
NodeSet.create_index()
creates indexes for all individual merge_keys
properties as well as a compound index.
RelationshipSet.create_index()
creates the indexes required for matching the start node and end node:
from graphio import RelationshipSet
from py2neo import Graph
graph = Graph()
person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])
person_likes_food.create_index(graph)
This will create single-property indexes for :Person(name) and :Food(type).
Load Data
After building NodeSet
and RelationshipSet
you can create or merge everything in Neo4j.
You need a py2neo.Graph
instance to create data. See: https://py2neo.org/v4/database.html#the-graph
from py2neo import Graph
graph = Graph()
people.create(graph)
person_likes_food.create(graph)
Warning
Graphio does not check if the nodes referenced in the RelationshipSet
actually exist. It is meant
to quickly build data sets and throw them into Neo4j, not to maintain consistency.
Create
create()
will, as the name suggests, create all data. This will create
duplicate nodes even if a merge_key
is set on a NodeSet
.
Merge
merge()
will merge on the merge_key
defined on the NodeSet
.
The merge operation for NodeSet
offers more control.
You can pass a list of properties that should not be overwritten on existing nodes:
NodeSet.merge(graph, preserve=['name', 'currency'])
This is equivalent to:
ON CREATE SET ..all properties..
ON MATCH SET ..all properties except 'name' and 'currency'..
Graphio can also append properties to arrays:
NodeSet.merge(graph, append_props=['source'])
This will create a list for the node property source
and append values ON MATCH
.
Both can also be set on the NodeSet
:
nodeset = NodeSet(['Person'], ['name'], preserve=['country'], array_props=['source'])
Group Data Sets in a Container
A Container
can be used to group NodeSet
and RelationshipSet
:
my_data = Container()
my_data.add(people)
my_data.add(person_likes_food)
Note
This is particularly useful if you build many NodeSet
and RelationshipSet
and want to group data sets (e.g. because of dependencies).
You can iterate the NodeSet
and RelationshipSet
in the Container
:
for nodeset in my_data.nodesets:
nodeset.create(graph)