As the book is still in beta and incomplete, I skip CouchDB (the chapter is not there yet in beta 2.0), and will spend this week with Neo4j.
Neo4j is a graph database, meaning it focuses on navigation between vertices (called nodes in Neo4j), through edges (called relationships). While other databases made it possible to join various pieces of data, Neo4j treats this as the main semantic mechanism
Neo4j can be distributed for high-availability, and is partition tolerant, but sharding is not supported (at the time of writing).
The first day focuses on basic creation and navigation of data. Nodes and relationships are the basic entities; by default nodes have just an id, while relationships are identified by the out and in nodes, and a type.
To spice this up a bit, it is possible to attach properties to both nodes and relationships. Values can be scalar or arrays of basic types (boolean, number, or string).
To navigate the data, the easiest seems to be the use of Gremlin, a language and database independent graph traversal language (the language has to be a JVM one).
Exercises
Neo4j Wiki
The Wiki is here.
Gremlin Documentation
There is a wiki.
List of Gremlin Steps
They are listed on the wiki.
Neo4j Shells
It is hard not to find them, as they’re already in the Web Admin
Console. Both
Cipher
and the
ReST API can
be used directly from the console, although the ReST API is limited
there (for instance the traverse
operation is not supported). Full
access requires an external client such as curl
.
Find all node names with another shell
In Cipher, there is no direct way to use all nodes as a starting
point, so instead I try to find all nodes linked to the first one
through a path that can be empty (i.e. the starting node is also
included). To remove duplicates, I use the
DISTINCT
function, but it must be applied in the context of an aggregation, so
I have to apply
COLLECT
as well:
1 2 3 |
|
which produces
1 2 3 4 5 6 |
|
Not exactly as easy as the Gremlin equivalent g.V.name
.
There is no way to achieve anything similar using the ReST API, as its traversal operation only returns full objects (either nodes, relationships or paths), and not properties.
Delete all the nodes and edges in your database
Well, the book already showed the powerful g.clear
Gremlin
command. It should be followed by g.addVertex()
to get back to the
original state (with just one node).
And that’s all for today.