Basic Workflow

NodeSets

With graphio you predefine the NodeSet and add nodes:

from graphio import NodeSet

people = NodeSet(['Person'], merge_keys=['name'])

people.add_node({'name': 'Peter', 'city': 'Munich'})

The first argument for the NodeSet is a list of labels used for all nodes in this NodeSet. The second optional argument are merge_keys, a list of properties that confer uniqueness of the nodes in this NodeSet. All operations based on MERGE queries need unique properties to identify nodes.

When you add a node to the NodeSet you can add arbitrary properties to the node.

Uniqueness of nodes

The uniqueness of the nodes is not checked when adding to the NodeSet. Thus, you can create mulitple nodes with the same ‘name’ property.

Use NodeSet.add_unique() to check if a node with the same properties exist already:

people = NodeSet(['Person'], merge_keys=['name'])

# first time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1

# second time
people.add_unique({'name': 'Jack', 'city': 'London'})
len(people.nodes) -> 1

Warning

This function iterates all nodes when adding a new one and does not scale well. Use only for small nodesets.

Default properties

You can set default properties on the NodeSet that are added to all nodes when loading data:

people_in_europe = NodeSet(['Person'], merge_keys=['name'],
                           default_props={'continent': 'Europe'})

RelationshipSets

In a similar manner, RelationshipSet are predefined and you add relationships:

from graphio import RelationshipSet

person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])

person_likes_food.add_relationship(
   {'name': 'Peter'}, {'type': 'Pizza'}, {'reason': 'cheese'}
)

The arguments for the RelationshipSet

  • relationship type

  • labels of start node

  • labels of end node

  • property keys to match start node

  • property keys to match end node

When you add a relationship to RelationshipSet all you have to do is to define the matching properties for the start node and end node. You can also add relationship properties.

Default properties

You can set default properties on the RelationshipSet that are added to all relationships when loading data:

person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'],
                                    default_props={'source': 'survey'})

Create Indexes

Both class:~graphio.NodeSet and RelationshipSet allow you to create indexes to speed up data loading. NodeSet.create_index() creates indexes for all individual merge_keys properties as well as a compound index. RelationshipSet.create_index() creates the indexes required for matching the start node and end node:

from graphio import RelationshipSet
from py2neo import Graph

graph = Graph()

person_likes_food = RelationshipSet('KNOWS', ['Person'], ['Food'], ['name'], ['type'])

person_likes_food.create_index(graph)

This will create single-property indexes for :Person(name) and :Food(type).

Load Data

After building NodeSet and RelationshipSet you can create or merge everything in Neo4j.

You need a py2neo.Graph instance to create data. See: https://py2neo.org/v4/database.html#the-graph

from py2neo import Graph

graph = Graph()

people.create(graph)
person_likes_food.create(graph)

Warning

Graphio does not check if the nodes referenced in the RelationshipSet actually exist. It is meant to quickly build data sets and throw them into Neo4j, not to maintain consistency.

Create

create() will, as the name suggests, create all data. This will create duplicate nodes even if a merge_key is set on a NodeSet.

Merge

merge() will merge on the merge_key defined on the NodeSet.

The merge operation for NodeSet offers more control.

You can pass a list of properties that should not be overwritten on existing nodes:

NodeSet.merge(graph, preserve=['name', 'currency'])

This is equivalent to:

ON CREATE SET ..all properties..
ON MATCH SET ..all properties except 'name' and 'currency'..

Graphio can also append properties to arrays:

NodeSet.merge(graph, append_props=['source'])

This will create a list for the node property source and append values ON MATCH.

Both can also be set on the NodeSet:

nodeset = NodeSet(['Person'], ['name'], preserve=['country'], array_props=['source'])

Group Data Sets in a Container

A Container can be used to group NodeSet and RelationshipSet:

my_data = Container()

my_data.add(people)
my_data.add(person_likes_food)

Note

This is particularly useful if you build many NodeSet and RelationshipSet and want to group data sets (e.g. because of dependencies).

You can iterate the NodeSet and RelationshipSet in the Container:

for nodeset in my_data.nodesets:
    nodeset.create(graph)