March 13, 2010
8 notes
What’s new in Apache CouchDB 0.11 — Part Two: Views; JOINs Redux, Raw Collation for Speed
Hi again! It’s Jan again. Thanks for coming back. If you missed Part One, here’s your chance to catch up.
CouchDB JOINs Redux
When I started out talking about CouchDB (back in 2006) people were rarely aware of any databases that didn’t use SQL for querying. An frequent question was “How do I do JOINs?” — The short answer is “You don’t”. People worried about retrieving “related data” from a non-relational database.
Turns out “related data” and “relation” have very little in common (the first is groups of data, the second is a mathematical term that refers to a multivalued mapping more commonly called a “table” ). Long story short there was (and still is) confusion.
Of course CouchDB lets you retrieve related data in any shape or form you like. Christopher Lenz did a great write-up on “CouchDB ‘JOINs’” dating as far back as 2007, it is still very applicable.
Since then, though, CouchDB gained a few new features to tackle the same problem: fetch related data. These aren’t new in 0.11, but they did get refined, so it makes sense to revisit them here. Since 0.10, you could query a view with the query parameter include_docs=true. When specified, CouchDB would fetch, for each row in the view result, the corresponding document from the database. This allows users to make a trade-off between smaller view indexes (and hence shorter view index times) and slower view index (for each row, CouchDB makes a single request to the database).
With 0.11, you can include a _id member in the value of the view result and have CouchDB fetch a document with another id than the one that produced the view row.
As an example, consider these four documents:
{
"_id": "Claire",
"title": "VP of Official Attitude"
}
{
"_id": "Mikeal",
"title": "VP of Pastries and Automating Stuff"
}
{
"_id": "Jason",
"title": "VP of Hosting and Lightning"
}
{
"_id": "team",
"members": ["Claire", "Mikeal", "Jason"]
}
And this map function:
function(doc) {
if(doc.members) {
doc.members.forEach(function(member) {
emit(member, {_id: member});
});
}
}
The regular result looks like this:
{
total_rows: 4,
offset: 0,
rows: [
{"key":"Claire", "value":{"_id":"Claire"}},
{"key":"Jason", "value":{"_id":"Jason"}},
{"key":"Mikeal", "value":{"_id":"Mikeal"}}
]
}
If you query the view with include_docs=true, the result looks like this:
{
total_rows: 4,
offset: 0,
rows: [
{
"key":"Claire",
"value":{"_id":"Claire"},
"doc": {"_id":"Claire","title":"VP of Official Attitude"}
},
{
"key":"Jason",
"value":{"_id":"Jason"},
"doc": {"_id":"Jason","title":"VP of Hosting and Lightning"}
},
{
"key":"Mikeal",
"value":{"_id":"Mikeal"},
"doc": {"_id":"Mikeal","title":"VP of Pastries and Automating Stuff"}
}
]
}
Pretty slick, don’t you think?
Raw Collation
This one is a quickie for speed freaks.
By default all views are sorted in a locale-dependent unicode collation order. This ensures that languages get sorted naturally instead of an artificial byte-order collation.
This is great, but sometimes, you don’t need unicode-aware sorting. CouchDB 0.11 allows you to specify a view definition option to enable raw collation for a view.
{
"_id": "_design/app",
"views": {
"faster": {
"map": "function(doc) {emit(doc.field, 1);}",
"options": {
"collation": "raw"
}
}
}
}
Views that are built with this option avoid calling out to the ICU (IBM Components For Unicode) driver to sort all rows. Hence the speed-up. How much faster depends on your data and hardware, but the difference can be significant.
If you feel like it, create a small benchmark, publish the numbers on your blog and let us know! We’ll post a follow-up and compare everybody’s results.
Next up in our series are the new features of the CouchDB Replicator, stay tuned!