Day 2 is about Views in CouchDB, which serve as an introduction to the
more general MapReduce support.
It is another fairly short day, as much of this section is actually
about the complexities of XML parsing…
Like Riak and MongoDB, CouchDB is scripted with JavaScript, so today has
a feeling of déjà vu.
View concept
A View is just a mapping of a key to a value. Keys and values are
extracted from documents; there can be more than one key for each
document, as in MongoDB.
Once the view has been built and updated for the documents it applies
to, it can be accessed by key using optimized methods (all based on
some form of lexicographical order).
View performance
A View in CouchDB is essentially the equivalent of a
materialized view
in relational databases.
Access to the view causes it to be updated (i.e. recomputed) if
necessary, which can be a painfully slow experience. I had imported
the whole content of the music database (26990 records), and each time
I tested a Temporary View or saved a Permanent one, I had to wait for
CouchDB to finish the refresh (fortunately not too long on this
dataset).
It interesting to note that while relational databases require the
schema to be designed ahead of time, but support arbitrary queries,
CouchDB let you ignore the schema, but need you to design the
queries ahead of time.
Exercises
emit function
The key can be
any JSON object,
although I would say that only strings and arrays of strings have
sensible semantics.
Arrays can be used with reduce functions to provide query time custom
grouping, as explained
here.
For instance, to compute the number of records by date, I used the
releasedate of each album to create a key array
[year, month, date], and a value of 1 (1 for each album):
Each document in the view is now a date as an array, with a single
number for the record made that date (there are as many identical keys
as there were records for a given day).
When querying, by default, the reduce function will be called on
identical keys to get a single value:
With the group_level parameter, I can control whether I want to
group by day (group=true or group_level=3, as above), by month
(group_level=2), or year (group_level=1):
The code of each script is similar, in a way Russian Dolls are
similar: each one is an extension of the previous, digging deeper into
the nested structure of the original document.