Wakatta!

Like Eureka!, only cooler

Seven Databases in Seven Weeks CouchDB Day 2

Day 2 is about Views in CouchDB, which serve as an introduction to the more general MapReduce support.

It is another fairly short day, as much of this section is actually about the complexities of XML parsing…

Like Riak and MongoDB, CouchDB is scripted with JavaScript, so today has a feeling of déjà vu.

View concept

A View is just a mapping of a key to a value. Keys and values are extracted from documents; there can be more than one key for each document, as in MongoDB.

Once the view has been built and updated for the documents it applies to, it can be accessed by key using optimized methods (all based on some form of lexicographical order).

View performance

A View in CouchDB is essentially the equivalent of a materialized view in relational databases.

Access to the view causes it to be updated (i.e. recomputed) if necessary, which can be a painfully slow experience. I had imported the whole content of the music database (26990 records), and each time I tested a Temporary View or saved a Permanent one, I had to wait for CouchDB to finish the refresh (fortunately not too long on this dataset).

It interesting to note that while relational databases require the schema to be designed ahead of time, but support arbitrary queries, CouchDB let you ignore the schema, but need you to design the queries ahead of time.

Exercises

emit function

The key can be any JSON object, although I would say that only strings and arrays of strings have sensible semantics.

Arrays can be used with reduce functions to provide query time custom grouping, as explained here.

For instance, to compute the number of records by date, I used the releasedate of each album to create a key array [year, month, date], and a value of 1 (1 for each album):

Album by Date
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function(doc) {
    if ('name' in doc && 'albums' in doc) {
        doc.albums.forEach(function(album) {
            if ('name' in album && 'releasedate' in album) {
                var d = new Date(album.releasedate);
                if (d != null) {
                    var key = [d.getFullYear(),
                               d.getMonth(),
                               d.getDate()];
                    emit(key, 1);
                }
            }
        });
    }
}

As I intend to use grouping, I also need a reduce function:

Reduce Function
1
2
3
function(keys, values, rereduce) {
    return sum(values);
}

Each document in the view is now a date as an array, with a single number for the record made that date (there are as many identical keys as there were records for a given day).

When querying, by default, the reduce function will be called on identical keys to get a single value:

1
2
3
4
5
6
7
8
$ curl http://localhost:5984/music/_design/count/_view/album_by_date?limit=5\&group=true
{"rows":[
{"key":[2004,11,18],"value":2},
{"key":[2004,11,21],"value":1},
{"key":[2004,11,22],"value":1},
{"key":[2004,11,28],"value":1},
{"key":[2004,11,29],"value":2}
]}

(month is 0 based…)

With the group_level parameter, I can control whether I want to group by day (group=true or group_level=3, as above), by month (group_level=2), or year (group_level=1):

1
2
3
4
5
6
7
8
$ curl http://localhost:5984/music/_design/count/_view/album_by_date?limit=5\&group_level=2
{"rows":[
{"key":[2004,11],"value":7},
{"key":[2005,0],"value":3},
{"key":[2005,1],"value":3},
{"key":[2005,2],"value":5},
{"key":[2005,3],"value":15}
]}
1
2
3
4
5
6
7
8
$ curl http://localhost:5984/music/_design/count/_view/album_by_date?limit=5\&group_level=1
{"rows":[
{"key":[2004],"value":7},
{"key":[2005],"value":306},
{"key":[2006],"value":1277},
{"key":[2007],"value":3454},
{"key":[2008],"value":7330}
]}

View request parameters

There are quite a few of them listed here.

Random artist script

The code is essentially the same as the one mapping names to ids, but here it associates random to name.

Random Artist
1
2
3
4
5
function(doc) {
    if ('name' in doc && 'random' in doc) {
        emit(doc.random, doc.name);
    }
}

Random artist URL

The URL below returns the first artist whose random number is greater than the random one generated by Ruby.

1
2
3
4
$ curl http://localhost:5984/music/_design/random/_view/artist?limit=1\&startkey=`ruby -e 'p rand'`
{"total_rows":26987,"offset":23180,"rows":[
{"id":"364215","key":0.8581072409917536,"value":"rakombiacje"}
]}

As expected, if given a value too large (for instance, 1), the query returns nothing:

1
2
$ curl http://localhost:5984/music/_design/random/_view/artist?limit=1\&startkey=1
{"total_rows":26987,"offset":26987,"rows":[]}

Random everything

The code of each script is similar, in a way Russian Dolls are similar: each one is an extension of the previous, digging deeper into the nested structure of the original document.

Random Album

Random Album
1
2
3
4
5
6
7
8
9
10
11
function(doc) {
    if ('name' in doc && 'albums' in doc) {
        doc.albums.forEach(function(album) {
            if ('name' in album && 'random' in album) {
                var key = album.random,
                value = { by: doc.name, album: album.name };
                emit(key, value);
            }
        });
    }
}

Testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ curl http://localhost:5984/music/_design/random/_view/album?limit=1\&startkey=`ruby -e 'p rand'` | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   142    0   142    0     0  36419      0 --:--:-- --:--:-- --:--:-- 71000
{
    "offset": 19416, 
    "rows": [
        {
            "id": "357995", 
            "key": 0.35656765622628905, 
            "value": {
                "album": "Demo Releses", 
                "by": "SilvanestY"
            }
        }
    ], 
    "total_rows": 54669
}

Random Track

Random Track
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function(doc) {
    if ('name' in doc && 'albums' in doc) {
        doc.albums.forEach(function(album) {
            if ('tracks' in album) {
                album.tracks.forEach(function(track) {
                    if ('name' in track) {
                        var key = track.random,
                        value = { by: doc.name,
                                  album: album.name,
                                  track: track.name };
                        emit(key, value);
                    }
                });
            }
        });
    }
}

Testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ curl http://localhost:5984/music/_design/random/_view/track?limit=1\&startkey=`ruby -e 'p rand'` | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   163    0   163    0     0  10294      0 --:--:-- --:--:-- --:--:-- 11642
{
    "offset": 118485, 
    "rows": [
        {
            "id": "370935", 
            "key": 0.3460755726665503, 
            "value": {
                "album": "Mako Yama", 
                "by": "Mako Yama", 
                "track": "Kinu No Tsuki"
            }
        }
    ], 
    "total_rows": 342013
}

Random Tag

Random Tag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
function(doc) {
    if ('name' in doc && 'albums' in doc) {
        doc.albums.forEach(function(album) {
            if ('tracks' in album) {
                album.tracks.forEach(function(track) {
                    if ('name' in track && 'tags' in track) {
                        track.tags.forEach(function(tag) {
                            if ('idstr' in tag) {
                                var key = tag.random,
                                value = { by: doc.name,
                                          album: album.name,
                                          track: track.name,
                                          tag: tag.idstr };
                                emit(key, value);
                            }
                        });
                    }
                });
            }
        });
    }
}

Testing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ curl http://localhost:5984/music/_design/random/_view/tag?limit=1\&startkey=`ruby -e 'p rand'` | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   218    0   218    0     0   9717      0 --:--:-- --:--:-- --:--:-- 10380
{
    "offset": 151963, 
    "rows": [
        {
            "id": "340779", 
            "key": 0.2788540070246309, 
            "value": {
                "album": "CABACA-5", 
                "by": "CC Asia Band", 
                "tag": "electricguitar", 
                "track": "CC Asia Band - CABACA - 35 CCilly Love Song"
            }
        }
    ], 
    "total_rows": 545892
}

And that’s it for Day 2.

Comments