Main Classes

NodeSet

class graphio.NodeSet(labels, merge_keys=None, batch_size=None, default_props=None, preserve=None, append_props=None, indexed=False)

Container for a set of Nodes with the same labels and the same properties that define uniqueness.

Parameters
  • labels (list[str]) – The labels for the nodes in this NodeSet.

  • merge_keys (list[str]) – The properties that define uniqueness of the nodes in this NodeSet.

  • batch_size (int) – Batch size for Neo4j operations.

add_node(properties)

Create a node in this NodeSet.

Parameters

properties (dict) – Node properties.

add_unique(properties)

Add a node to this NodeSet only if a node with the same merge_keys does not exist yet.

Note: Right now this function iterates all nodes in the NodeSet. This is of course slow for large numbers of nodes. A better solution would be to create an ‘index’ as is done for RelationshipSet.

Parameters

properties (dict) – Node properties.

all_properties_in_nodeset()

Return a set of all property keys in this NodeSet

Returns

A set of unique property keys of a NodeSet

create(graph, database: Optional[str] = None, batch_size=None)

Create all nodes from NodeSet.

create_index(graph, database=None)

Create indices for all label/merge ky combinations as well as a composite index if multiple merge keys exist.

In Neo4j 3.x recreation of an index did not raise an error. In Neo4j 4 you cannot create an existing index.

Index creation syntax changed from Neo4j 3.5 to 4. So far the old syntax is still supported. All py2neo functions (v4.4) work on both versions.

classmethod from_csv_json_set(csv_file_path, json_file_path, load_items: bool = False)

Read the default CSV/JSON file combination.

Needs paths to CSV and JSON file.

Parameters
  • csv_file_path – Path to the CSV file.

  • json_file_path – Path to the JSON file.

  • load_items – Yield items from file (False, default) or load them to memory (True).

Returns

The NodeSet.

merge(graph, merge_properties=None, batch_size=None, preserve=None, append_props=None, database=None)

Merge nodes from NodeSet on merge properties.

Parameters

merge_properties – The merge properties.

node_properties()

Yield properties of the nodes in this set. Used for create function.

object_file_name(suffix: Optional[str] = None) str

Create a unique name for this NodeSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!

nodeset_Label_merge-key_uuid

With suffix:

nodeset_Label_merge-key_uuid.json

serialize(target_dir: str)

Serialize NodeSet to a JSON file in a target directory.

This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.

to_csv(filepath: str, filename: Optional[str] = None, quoting: Optional[int] = None) str

Create a CSV file for this nodeset.

Parameters
  • filepath – Path where the file is stored.

  • filename – Optional filename. A filename will be autocreated if not passed.

  • quoting – Optional quoting setting for csv writer (any of csv.QUOTE_MINIMAL, csv.QUOTE_NONE, csv.QUOTE_ALL etc).

to_dict()

Create dictionary defining the nodeset.

update_node(properties: dict)

Update an existing node by overwriting all properties.

Note that this requires NodeSet(…, indexed=True) which is not the default!

Parameters

properties – Node property dictionary.

RelationshipSet

class graphio.RelationshipSet(rel_type, start_node_labels, end_node_labels, start_node_properties, end_node_properties, batch_size=None, default_props=None)

Container for a set of Relationships with the same type of start and end nodes.

Parameters
  • rel_type – Realtionship type.

  • start_node_labels – Labels of the start node.

  • end_node_labels – Labels of the end node.

  • start_node_properties – Property keys to identify the start node.

  • end_node_properties – Properties to identify the end node.

  • batch_size – Batch size for Neo4j operations.

add_relationship(start_node_properties: dict, end_node_properties: dict, properties: Optional[dict] = None)

Add a relationship to this RelationshipSet.

Parameters

properties – Relationship properties.

all_property_keys() Set[str]

Return a set of all property keys in this RelationshipSet

Returns

A set of unique property keys of a NodeSet

create(graph, database=None, batch_size=None)

Create relationships in this RelationshipSet

py2neo bulk works with tuples and th order of elements in the tuple. The underlying Relationship used in the RelationshipSet uses a dictionary. Work around this for now, adapt the RelSet.add_relationship() method later.

create_index(graph, database=None)

Create indices for start node and end node definition of this relationshipset. If more than one start or end node property is defined, all single property indices as well as the composite index are created.

In Neo4j 3.x recreation of an index did not raise an error. In Neo4j 4 you cannot create an existing index.

Index creation syntax changed from Neo4j 3.5 to 4. So far the old syntax is still supported. All py2neo functions (v4.4) work on both versions.

csv_query(query_type: str, filename: Optional[str] = None, periodic_commit=1000) str

Generate the CREATE CSV query for this RelationshipSet. The function tries to take care of type conversions.

Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.

LOAD CSV WITH HEADERS FROM xyz AS line MATCH (a:Gene), (b:Protein) WHERE a.sid = line.a_sid AND b.sid = line.b_sid AND b.taxid = line.b_taxid CREATE (a)-[r:MAPS]->(b) SET r.key1 = line.rel_key1, r.key2 = line.rel_key2

classmethod from_csv_json_set(csv_file_path, json_file_path, load_items: bool = False)

Read the default CSV/JSON file combination.

Needs paths to CSV and JSON file.

Parameters
  • csv_file_path – Path to the CSV file.

  • json_file_path – Path to the JSON file.

  • load_items – Yield items from file (False, default) or load them to memory (True).

Returns

The RelationshipSet.

merge(graph, database=None, batch_size=None)

Create relationships in this RelationshipSet

object_file_name(suffix: Optional[str] = None) str

Create a unique name for this RelationshipSet that indicates content. Pass an optional suffix. NOTE: suffix has to include the ‘.’ for a filename!

relationshipset_StartLabel_TYPE_EndLabel_uuid

With suffix:

relationshipset_StartLabel_TYPE_EndLabel_uuid.json

serialize(target_dir: str)

Serialize NodeSet to a JSON file in a target directory.

This function is meant for dumping/reloading and not to create a general transport format. The function will likely be optimized for disk space or compressed in future.

to_csv(filepath: str, filename: Optional[str] = None, quoting: Optional[int] = None) str

Note: You can’t use arrays as properties for nodes/relationships when creating CSV files.

LOAD CSV WITH HEADERS FROM xyz AS line MATCH (a:Gene), (b:GeneSymbol) WHERE a.sid = line.a_sid AND b.sid = line.b_sid AND b.taxid = line.b_taxid CREATE (a)-[r:MAPS]->(b) SET r.key1 = line.rel_key1, r.key2 = line.rel_key2

# CSV file header a_sid, b_sid, b_taxid, rel_key1, rel_key2

Parameters

Container

class graphio.Container(objects=None)

A container for a collection of Nodes, Relationships, NodeSets and RelationshipSets.

A typical parser function to e.g. read an Excel file produces a mixed output which then has to be processed accordingly.

Also, sanity checks and data statistics are useful.

merge_nodesets()

Merge all node sets if merge_key is defined.

property nodesets

Get the NodeSets in the Container.

property relationshipsets

Get the RelationshipSets in the Container.