The second database is Riak, a key-value distributed store. Key-value stores a not really new (many property or configuration files are really basic key-value stores, and Berkeley DB has long been a good choice for those who needed something a little bit more robust than simple files but not as complex as a relational database).
Still, going from this simple concept to a distributed store that can execute data processing on each of its nodes requires more than simply scaling things up, and I hope that this book will give me some idea of how such a store (and the other NoSQL) might fit in the solution landscape.
But that is probably getting a bit ahead of myself; right now I’d be happy just to know how to use Riak.
The client can be the simple
cURL command, as Riak’s interface is based on HTTP. This simplifies the technical stack, but pushes some of the complexity on the client. It is clear that Riak will not provide anything as easy and convenient as PostgreSQL’s
Riak’s basic API is a REST based CRUD (with Create being pretty much the same as Update). Additional attributes, such as meta-data or the more important links are passed as headers in the HTTP request.
It is simple, but somewhat inconvenient: there is no concept of partial update. When you want to update an object, you need to pass all the relevant data: meta-data, links, and content. Forget to mention one, and Riak will forget it too.
When was this book written?
I had noticed that the book refers to PostgreSQL 9.0 when 9.1 has been out for a while. In this chapter on Riak, the author uses an apparently old format for the URLs,
/riak/bucket/key, whereas the official documentation recommends
/buckets/bucket/keys/key (for instance,
/buckets/animals/keys/polly). Both formats can be used and are interoperable, but there is no need to teach already deprecated formats.
Of course, I found out about this new format after I completed all the exercises for today. So I will still use the old format for today.
Presumably this will be fixed by the time the book gets published.
Simple but useful trick
Reading unformatted JSON data can be difficult. I found that Python provides a simple way to pretty print JSON output:
(there are certainly other tools. Python is just the first one I came upon). To turn this into a simple, easy to use command, I added this to my
That way, I can just pipe the output of
This unfortunately does not work with
curl additional output (such as HTTP headers).
As always, Wikipedia is very useful.
Differences between the dev1, dev2 and dev3 servers
The only difference is the port number. But there is some intelligence in the startup script to map each server to its own directory for permanent storage.
Link from Polly to her picture
This creates the link. Note, as I mentioned above, that the content needs to be repeated. Putting no body would cause a
curl error (as an HTTP
PUT request must have a body):
1 2 3 4
The image can be retrived from Polly by following the link:
or, using the new format:
POST a new type of document
Here I upload the Seven Databases in Seven Weeks (legal) PDF:
1 2 3
I use the
-i option to retrieve the HTTP headers of the response and get the generated key. The command above has this output:
1 2 3 4 5 6 7 8 9
Otherwise, I could list the keys for this bucket:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
I can use the URL below to retrieve the document in a browser:
PUT a medecine image and link to Ace
Once again, nothing too complex, but everything has to be done at the same time, as partial updates are not possible:
1 2 3
Then the image itself can be retrieve at:
Finally, I can get the poor patient by following links:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
And this completes Day 1. The basic REST API is not complex, but its simplicity cuts both ways. There is a lot of typing required; I expect client libraries to be much easier to use, at the cost of having to write an application or script to do anything.
Tomorrow will cover MapReduce in the context of Riak.