Wakatta!

Like Eureka!, only cooler

Seven Databases in Seven Weeks Neo4j Day 1

As the book is still in beta and incomplete, I skip CouchDB (the chapter is not there yet in beta 2.0), and will spend this week with Neo4j.

Neo4j is a graph database, meaning it focuses on navigation between vertices (called nodes in Neo4j), through edges (called relationships). While other databases made it possible to join various pieces of data, Neo4j treats this as the main semantic mechanism

Neo4j can be distributed for high-availability, and is partition tolerant, but sharding is not supported (at the time of writing).

The first day focuses on basic creation and navigation of data. Nodes and relationships are the basic entities; by default nodes have just an id, while relationships are identified by the out and in nodes, and a type.

To spice this up a bit, it is possible to attach properties to both nodes and relationships. Values can be scalar or arrays of basic types (boolean, number, or string).

To navigate the data, the easiest seems to be the use of Gremlin, a language and database independent graph traversal language (the language has to be a JVM one).

Exercises

Neo4j Wiki

The Wiki is here.

Gremlin Documentation

There is a wiki.

List of Gremlin Steps

They are listed on the wiki.

Neo4j Shells

It is hard not to find them, as they’re already in the Web Admin Console. Both Cipher and the ReST API can be used directly from the console, although the ReST API is limited there (for instance the traverse operation is not supported). Full access requires an external client such as curl.

Find all node names with another shell

In Cipher, there is no direct way to use all nodes as a starting point, so instead I try to find all nodes linked to the first one through a path that can be empty (i.e. the starting node is also included). To remove duplicates, I use the DISTINCT function, but it must be applied in the context of an aggregation, so I have to apply COLLECT as well:

1
2
3
START n=node(0)
MATCH n-[*0..]-x
RETURN COLLECT(DISTINCT x.name)

which produces

1
2
3
4
5
6
==> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | collect(distinct x.name)                                                                                                                                             |
==> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | List(Prancing Wolf Ice Wine 2007, riesling, Prancing Wolf Spatleses 2007, Prancing Wolf Winery, Prancing Wolf Kabinett 2002, Tom, Wine Expert Monthly, Patty, Alice) |
==>
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Not exactly as easy as the Gremlin equivalent g.V.name.

There is no way to achieve anything similar using the ReST API, as its traversal operation only returns full objects (either nodes, relationships or paths), and not properties.

Delete all the nodes and edges in your database

Well, the book already showed the powerful g.clear Gremlin command. It should be followed by g.addVertex() to get back to the original state (with just one node).

And that’s all for today.

Comments