What’s new in Apache CouchDB 0.11 — Part One: Nice URLs with Rewrite Rules and Virtual Hosts

Hi! Jan’s writing.

With CouchDB 0.11 just around the corner, I thought I let you in on some secrets.

This is a small series of blog posts that shed some light on the things new in CouchDB 0.11.0. Why should you care? The 0.11 branch of CouchDB is a feature freeze branch for CouchDB 1.0 which I hope to ship in just a few weeks.

CouchDB 0.10 was already in a good shape to make a lot of users happy and versions as soon as 0.7 have been used successfully in production over three years ago. 0.11 doesn’t add a whole lot of new features, but it comes with a few convenience features that will make your 1.0 experience even more pleasant. The CouchDB dev-team did a lot of work on security, performance and robustness, but this should not be the focus of this series.

Nice URLs

Ever since CouchApps, applications that run in your browser and are served right out of CouchDB, were introduced users have asked for “nicer URLs”. CouchDB’s URLs are not very bad by default, but there is certainly room for improvement.

The URL for querying a view in CouchDB looks like this:

/database/_design/app/_view/name[?optional=parameters]

Functionally, not very bad /database is the database; _design/app is the application or design document; _view/name is the name of the view. Pretty straightforward.

What if the index page of your application is a list of blog posts? It looks like this:

/sofa/_design/sofa/_list/posts/all?descending=true&limit=5

Still understandable, but probably not what you want to show your users on their first (or any) visit.

CouchDB 0.11 lets you create nicer URLs. The path to nicer URLs includes two separate steps: URL Rewriting and Virtual Hosts.

URL Rewriting

The URL rewriter in CouchDB gives you full control over the URL space, but with a twist: A new URL endpoint (_rewrite) on your design documents lets you rewrite any incoming request to a regular CouchDB API URL. Here is an example:

To rewrite /db/_design/app/_rewrite/blog to /db/_design/app/_list/posts/all?descending=true&limit=5 you simply add a new attribute rewrites to your design document:

{
  "_id": "_design/app",
  "_rev": "1-ACDEE781EAD81DA...",
  "rewrites": [
    {
      "from": "/blog",
      "to": "_list/posts/all?descending=true&limit=5"
    }
  ],
  "views": { ... },
  "lists": {... },
  ...
}

Now all requests to /db/_design/app/_rewrite/blog are rewritten to /db/_design/app/_list/posts/all?descending=true&limit=5 internally.

Note that the rewrite target in the "to" attribute is relative to the design document it is defined in. You can use ../ to add a rewrite target that is not part of this design document.

Nice! But how is

/db/_design/app/_rewrite/blog

a significant improvement over

/db/_design/app/_list/posts/all?descending=true&limit=5

It is a little shorter, sure, and includes some default parameters, not bad, but the real trick you can pull here is if you hook this up to a virtual host.

Virtual Hosts

Virtual hosts allow you to map any domain name to a path inside CouchDB. For example http://couch.io could point to /couchio/ to remove one part of the path out of your URL.

The really cool trick here though is to map your virtual host to the rewriter URL and have http://couch.io point to /couchio/_design/app/_rewrite. To carry on with our blog example, all requests to http://couch.io/blog are now rewritten to /couchdb/_design/app_list/posts/all?descending=true&limit=5. Awesome cake!

How does it work? Virtual hosts operate solely on the HTTP level. This makes them very simple, yet flexible.

Each HTTP 1.1 request includes a mandatory header field Host: hostname.com with the server name it is trying to reach. You can tell CouchDB to look for that Host header and redirect all requests that match to any URL inside CouchDB by adding this to your configuration file local.ini:

[vhosts]
couch.io = /couchio/_design/app/_rewrite

Pretty neat, eh?

A note on security: Virtual hosts let you map domains to URLs in CouchDB but they don’t protect against anyone accessing your raw CouchDB API. There’s at least one way around it that should it be sufficient to illustrate why virtual hosts are not a security device:

Clients are in control of what HTTP headers they send. A malicious client could omit sending a Host header or simply send a valid HTTP 1.0 request that doesn’t require a Host header at all. CouchDB’s default behaviour is for each request that doesn’t match any virtual host for any reason, it behaves like a regular CouchDB instance without a virtual host defined.

More Rewriting

Back to the rewriter. I only showed you the most simple rewrite rule, but there are a few more things you can do to make really useful rewrites.

First, rewrite rules can be defined in a list. In the above example, the list has only one element, one rewrite rule, but you can add more by simply adding {"from":"...","to":"..."} objects. Rewrite rules are applied in the order they were defined. If an incoming URL matches the first rule but also the second, the first one will always be used to do the rewrite.

Second, the "from" and "to" attributes (CouchDB calls them path specs) are really patterns. I used literal paths for the example to explain how things work, but a path spec can include placeholders. Named placeholders start with a colon (:foo) and there’s the “match all” placeholder *. It can be used to match “the rest of the path”, but not in the middle of a path spec.

Here are a few examples that use placeholders:

{
  "from": "/db/:doc",
  "to": "_show/name/:doc"
}

This rewrite rule lets you rewrite every request to a document to a specific show function (name) and pass in the document’s id to the show function. This allows you to modify your document on the way out of CouchDB transparently and on the fly.

{
  "from": "/db/*",
  "to": "_show/name/*"
}

This rewrite rule behaves a lot like the previous one, but in addition to rewriting the document id to a show function, it also passes any optional query parameters a request might have. E.g.

/db/doc?meta=true

Would be rewritten to:

/db/_design/app/_show/doc?meta=true

Neat.

Speaking of query parameters, you can define target parameters with placeholders from the query explicitly.

{
  "from": "/blog/:start",
  "to": "_list/posts/all",
  "query": {
    "startkey": ":start"
  }
}

This rewrites /blog/5 to /db/_design/app/_list/posts/all?startkey=5.

What about request methods, you ask? Terrific question! The CouchDB rewriter has something in store for you:

[
  {
    "from": "/db/*",
    "to": "_update/name/*",
    "method": "POST"
  },
  {
    "from": "/db/*",
    "to": "_show/name/*",
    "method": "GET"
  }
]

These two rewrite rules point to different URLs depending on the HTTP request method. Since it doesn’t make a lot of sense to POST to a show function, POST requests are rewritten to an update function instead since it can deal with POST requests.

Wrapping Up

This should be it for the first installation of this mini series, tag along for more posts about the new features in CouchDB 0.11. I hope this makes you excited about CouchDB 1.0.