Today the book covers all kinds of queries goodness in MongoDB: indexing, advanced group queries, and MapReduce.
Once again, the contrast with Riak is stark. MongoDB is able to optimize queries on its JSON documents because it understand the format directly (whereas it is stored as an opaque block in Riak). Using JavaScript is also simpler: no need to quote the function codes; just pass a function object to the commands that need one.
Indexes
MongoDB comes by default with fairly sophisticated indexing options. Perhaps not as many as PostgreSQL, but still very flexible. Two basic types, range (B-Tree) and geospatial indexes; multikeys (with the ability to sort each key in a different order); sparse, …
Combined with the
explain
function,
this makes classic (i.e. non MapReduce) queries usable.
Thus MongoDB is a good hybrid between traditional databases (although
document rather than schema oriented), and new MapReduce platforms
such as Hadoop
.
Aggregation
MongoDB also supports a number of aggregation functions. The most
flexible one,
group
,
is not compatible with sharding, but otherwise it provides yet more
coverage of relational database features.
MapReduce
Using MongoDB’s
mapreduce
is much
easier than using Riak`s: the functions do not have to be passed as
strings, they can be stored in the server directly from the shell, and
because MongoDB understand JSON directly, there is not need to first
parse the document
On the other hand, Riak’s agnostic approach makes it possible to MapReduce other kinds of data.
Exercises
Shortcut for the admin commands
I could not find a single place with the info. The mongo shell API has no central list of functions; instead they are spread in the documentation or source for each prototype.
In general, an admin command that takes a MongoDB object as a first argument will have an equivalent method in the relevant prototype.
For instance, the
dbStats
command takes a
DB
;
in the
db.js
source file of the DB
prototype, there is a stats
method that
invokes the dbStats
command.
Online documentation for Queries and Cursors
As stated in the documentation, MongoDB returns a cursor for each queries; it is up to the client to iterate over the cursor to retrieve results.
The mongo shell usually hides the existence of cursors, but even there it is possible to expose them, using JavaScript.
MongoDB documentation for MapReduce
The documentation is here.
Collection function code
In each case, I got the code by running db.towns.functionName
(note
the absence of parenthesis). The mongo shell direct access to
JavaScript source code is especially convenient.
Collection help
The source code is just a long list of print
statements.
Collection findOne
The code will first execute a query, returning a cursor. The cursor is then checked for the presence of results; if there is any, the first one is returned.
Collection stats
This function will simply delegate the job to the runCommand
method,
invoking the collStats
command.
Finalize method
The finalize
function is very simple: rename the attribute count
to total
:
1 2 3 |
|
To use it, just add the finalize
attribute to the mapReduce
command:
1 2 3 4 5 6 7 |
|
Finally, I can check the result with db.phones.report.find()
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Use of driver
I used Java, and simply reimplemented the original Phones collection in a different database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
For the complete project, I just used Maven to fetch the MongoDB driver:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Ugly, but it does the job.
And that’s all for today.