Today is a bit juicier than the previous days (together). On the menu, advanced views (full MapReduce), replication, conflict management, and change monitoring.
Advanced views in CouchDB are, as noted yesterday, materialized output of MapReduce computations.
This has a cost: such computations are saved, so they take more time than with other implementations, the first time at least.
Updating the views, on the other hand, is fairly fast (CouchDB recomputes only what is necessary). Views have to be planned, but once there they are fairly cheap. For exploratory queries, other databases might be more appropriate.
CouchDB’s reduce functions distinguishes between the first invocation,
and the following ones (on values that have already gone through the
reduce function). This makes it possible to implement a
function which counts the number of values (the first invocation
transforms values into numbers, and the following ones add the numbers
Replication is the
one-way process of replicating the changes of one database on
another. Replication can be between any two databases, whether on the
same server or on different ones. It can be one time, or
continuous. The documents to replicate can be filtered, or selected by
Replication is a lower level mechanism than what MongoDB, for instance, proposes (where there is a strict hierarchy of masters and slaves), and closer to the flexible approach or Riak.
Of course, when concurrent writes are permitted, conflicts can occur, and CouchDB handles them.
Concurrent updates can cause conflicts, and CouchDB detects them so they can be dealt with.
First, conflicts cannot happen on a single server: updates to a document must refer to the latest revision, otherwise the update fails. So clients are directly aware that they need to resubmit the (merged) document.
When replication is enabled, conflicts result from concurrent updates
in two replicated databases. At the next replication, one version will
be selected as winning, and replicated to other databases. The other
versions are still accessible from the
(initially, only in the losing databases).
If two ways replications are in place, eventually, all databases will
_conflicts attribute populated (with all the losing
revisions, if there are more than one).
This makes it possible to implement a remedial action; it is possible to have views with only documents in conflicts, or to filter changes for conflicts, and implement merging actions in monitoring scripts.
CouchDB documentation helpfully provides some advice for designing conflict-aware applications.
Changes are dedicated views that contains a list of updates for a specific database. The parameters support starting at a given revision (in this case, a database revision, not a document revision), filtering documents, and keeping the stream open in several ways.
This makes it possible (easy, even) to monitor (interesting or relevant) changes, to synchronize with other systems, or to automatically resolve conflicts, for instance.
When using Long-Polling, I found that one very large datasets, the
JSON.parse invocation could take a long time, and would suggest to
always use a
limit parameter on the query, to cut the dataset down
to manageable chunks.
Built-in Reduce Functions
There are three, documented on the Wiki.
This function behaves just as the reduce function from the book; it
sums the values by key. It is useful when the map functions uses
emit(key, 1); (or some other numeric value).
It is similar to
_sum, but it counts the number of values rather
than merely summing them. It is useful when the value is not a number.
This is an extension of
_sum which computes additional statistics
(minimum, maximum, …) on the numeric values.
Filters are nicely described in CouchDB The Definitive Guide.
To create a new filter, I first create a design document to store the function:
1 2 3
by_country function retrieves a
country parameter from the
request, and compares it against the record
country attribute; only
the matching records are returned.
To monitor only updates to bands from Spain, for instance, I can use
To monitor for conflicts, I have the following design document:
1 2 3 4 5 6 7 8
With that, I can then listen for changes, keeping only the conflicts:
1 2 3 4 5
Because CouchDB only set the
_conflicts attribute on the
losing database; the winner database (the one in which the winning
revision was initially created) does not know about conflicts. This
means I must check against
music-repl instead of
Replication HTTP API
The API is documented here.
To use it, simply pass the
target databases to the
1 2 3 4
is an alternative to the use of the
_replicate URL above: documents inserted in the
database will, if properly formed, cause a replication job to be
started (either one-off, or continuous).
Deleting the document will cancel the replication job.
Document describing replications are updated to reflect the progress of the job.
The command below triggers a replication from
1 2 3 4
watch_changes_longpolling_impl.js script on the
database, it is possible to monitor the replication job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
The first change is when the document is created; the second when the job starts, and the third when it successfully completes.
_replicate based API, continuous jobs stored in
_replicator will resume when the database is restarted.
Continuous watcher skeleton
The approach is to keep input in a buffer, then extract as many line from the buffer as possible (if the last line is incomplete, it is put back into the buffer), and parse each line as a JSON object.
The format of each parsed object is different: each change is in its
own object, so there is no
results attribute any more.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
Continuous watcher implementation
I just inserted the code block above in the original
watch_changes_skeleton.js; no other modifications were required.
With the code block above, both the long polling and the continuous outputs are identical.
As I said above, conflicts are only created in the losing database, so
to test this I must use the
Otherwise, the code is simple: iterate on the
and for each revision it contains, emit that revision mapped to the
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
And this completes Day 3 and this overview of CouchDB.