Wakatta!

Like Eureka!, only cooler

Seven Databases in Seven Weeks MongoDB Day 2

Today the book covers all kinds of queries goodness in MongoDB: indexing, advanced group queries, and MapReduce.

Once again, the contrast with Riak is stark. MongoDB is able to optimize queries on its JSON documents because it understand the format directly (whereas it is stored as an opaque block in Riak). Using JavaScript is also simpler: no need to quote the function codes; just pass a function object to the commands that need one.

Indexes

MongoDB comes by default with fairly sophisticated indexing options. Perhaps not as many as PostgreSQL, but still very flexible. Two basic types, range (B-Tree) and geospatial indexes; multikeys (with the ability to sort each key in a different order); sparse, …

Combined with the explain function, this makes classic (i.e. non MapReduce) queries usable.

Thus MongoDB is a good hybrid between traditional databases (although document rather than schema oriented), and new MapReduce platforms such as Hadoop.

Aggregation

MongoDB also supports a number of aggregation functions. The most flexible one, group, is not compatible with sharding, but otherwise it provides yet more coverage of relational database features.

MapReduce

Using MongoDB’s mapreduce is much easier than using Riak`s: the functions do not have to be passed as strings, they can be stored in the server directly from the shell, and because MongoDB understand JSON directly, there is not need to first parse the document

On the other hand, Riak’s agnostic approach makes it possible to MapReduce other kinds of data.

Exercises

Shortcut for the admin commands

I could not find a single place with the info. The mongo shell API has no central list of functions; instead they are spread in the documentation or source for each prototype.

In general, an admin command that takes a MongoDB object as a first argument will have an equivalent method in the relevant prototype.

For instance, the dbStats command takes a DB; in the db.js source file of the DB prototype, there is a stats method that invokes the dbStats command.

Online documentation for Queries and Cursors

As stated in the documentation, MongoDB returns a cursor for each queries; it is up to the client to iterate over the cursor to retrieve results.

The mongo shell usually hides the existence of cursors, but even there it is possible to expose them, using JavaScript.

MongoDB documentation for MapReduce

The documentation is here.

Collection function code

In each case, I got the code by running db.towns.functionName (note the absence of parenthesis). The mongo shell direct access to JavaScript source code is especially convenient.

Collection help

The source code is just a long list of print statements.

Collection findOne

The code will first execute a query, returning a cursor. The cursor is then checked for the presence of results; if there is any, the first one is returned.

Collection stats

This function will simply delegate the job to the runCommand method, invoking the collStats command.

Finalize method

The finalize function is very simple: rename the attribute count to total:

finalize function
1
2
3
finalize = function(key, value) {
    return { total: value.count };
}

To use it, just add the finalize attribute to the mapReduce command:

1
2
3
4
5
6
7
results = db.runCommand({
    mapReduce: 'phones',
    map: map,
    reduce: reduce,
    finalize: finalize,
    out: 'phones.report'
})

Finally, I can check the result with db.phones.report.find():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> db.phones.report.find()
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 1 }, "value" : { "total" : 35 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 2 }, "value" : { "total" : 30 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 3 }, "value" : { "total" : 35 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 4 }, "value" : { "total" : 22 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 5 }, "value" : { "total" : 35 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 6 }, "value" : { "total" : 19 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 7 }, "value" : { "total" : 32 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 4, 5, 6 ], "country" : 8 }, "value" : { "total" : 32 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 1 }, "value" : { "total" : 7 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 2 }, "value" : { "total" : 5 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 3 }, "value" : { "total" : 5 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 4 }, "value" : { "total" : 10 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 5 }, "value" : { "total" : 6 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 6 }, "value" : { "total" : 4 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 7 }, "value" : { "total" : 6 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5 ], "country" : 8 }, "value" : { "total" : 5 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5, 6 ], "country" : 1 }, "value" : { "total" : 116 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5, 6 ], "country" : 2 }, "value" : { "total" : 103 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5, 6 ], "country" : 3 }, "value" : { "total" : 118 } }
{ "_id" : { "digits" : [ 0, 1, 2, 3, 5, 6 ], "country" : 4 }, "value" : { "total" : 104 } }
has more

Use of driver

I used Java, and simply reimplemented the original Phones collection in a different database:

(MongoTest.java) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
package jp.wakatta;

import static java.lang.Math.floor;
import static java.lang.Math.random;
import static java.lang.Math.round;

import java.util.List;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;
import com.mongodb.Mongo;

public class MongoTest {
  public static void main(String...args) {
      try {
          // connect to the database server
          // note: use 127.0.0.1 instead of localhost
          // as mongoDB only listen to the loopback
          // interface and not the ethernet one
          Mongo m = new Mongo("127.0.0.1");
          
          // make sure we're in a clean state
          m.dropDatabase("java");
          
          // create and access the database
          DB db = m.getDB("java");
          
          // create collection and populate it
          DBCollection phones = db.getCollection("phones");
          populatePhones( 800, 5550000, 5650000 , phones);
          
          // create index
          phones.createIndex(new BasicDBObject("display", 1));
          
          // list the indexes
          List<DBObject> list = phones.getIndexInfo();

          for (DBObject o : list) {
              System.out.println(o);
          }
  
          // close and cleanup
          m.close();
      } catch (Exception ex) {
          ex.printStackTrace();
      }
  }
  
  public static void populatePhones(long area, long start, long stop, DBCollection coll) {
      for (long i=start; i < stop; i++) {
          long country = round(floor(1 + (random() * 8)));
          long num = (country * 10000000000l) + (area * 10000000) + i;
          BasicDBObject phone = new BasicDBObject();
          BasicDBObject components = new BasicDBObject();
          phone.put("_id", num);
          components.put("country", country);
          components.put("area", area);
          components.put("prefix", (i * 10000));
          components.put("number", i);
          phone.put("components", components);
          phone.put("display", "+" + country + " " + area + "-" + i);
          coll.insert(phone);
      }
  }
}

For the complete project, I just used Maven to fetch the MongoDB driver:

(pom.xml) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>jp.wakatta</groupId>
  <artifactId>mongo-test</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>mongo-test</name>
  <dependencies>
      <dependency>
          <groupId>org.mongodb</groupId>
          <artifactId>mongo-java-driver</artifactId>
          <version>2.7.2</version>
      </dependency>
  </dependencies>
  <plugins>
      <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-compiler-plugin</artifactId>
          <version>2.3.2</version>
          <configuration>
              <source>1.6</source>
              <target>1.6</target>
          </configuration>
      </plugin>
  </plugins>
</project>

Ugly, but it does the job.

And that’s all for today.

Comments