Guest Blogpost by Mark Headd, Creator of TweetMy311

A little bit about Mark — He is an experienced voice, mobile and web application developer who has built civic applications for the District of Columbia, the Sunlight Foundation, the New York State Senate, and the cities of New York, San Francisco and Toronto. He writes about open government, programming, communication technologies and open source software on his blog at voiceingov.org.

Now for his guest blogpost…

Building Twitter Apps with CouchDB

CouchDB is a sexy beast.

There are so many things about it that make it attractive to Web 2.0 and mashup developers, and it seems like the more I use it the more cool features I find that make my life as a developer easier.

The dead simple HTTP API. Replication. URL rewriting. I could go on…

Several months ago, I started a project to develop an application that would let citizens who live in cities that have adopted the Open311 standard submit service requests using Twitter. There are some important reasons why I believe Twitter makes an ideal interface for submitting geographically specific service requests to municipalities. When I started to consider various alternatives for a platform for the TweetMy311 application, I looked very closely at CouchDB. This was right around the time CouchDB 0.11 was released. I soon discovered something else cool about my favorite NoSQL database — CouchDB absolutely rocks as a platform for Twitter applications.

Why CouchDB?

If you’ve had a play with the Twitter API and your thinking about building an application that uses it, you should take a close look at CouchDB. There are a number of reasons why it makes an ideal platform for Twitter applications:

  • You interact with a CouchDB instance the same way that you interact with the Twitter API — by making HTTP calls. This can help keep the code for your application clean and simple, and provides lots of opportunities for code reuse within your application.
  • The structure of documents in CouchDB are JSON, which is one of the formats returned from the Twitter API when searching for Tweets (or “status objects” in Twitter parlance).
  • Documents in a CouchDB database are assigned a globally unique ID — it’s how documents are distinguished from one another. Twitter also uses unique identifiers for status objects, so using the ID of a Twitter status object as the ID for a document in CouchDB makes life pretty easy for a Twitter app developer.

These benefits become readily apparent when you begin interacting with the Twitter API and storing status object in CouchDB.

A Quick Example

Twitter actually has several different APIs that developers can interact with — the basic REST API, the Search API and the Streaming API. Since there are lots of resources on these different APIs, and tons of good tutorials for how to use them, I’ll focus in this post on the most basic way to use HTTP requests to get Twitter status updates — using statuses/show. (The same approach described below can be used to get multiple status updates using the Twitter REST or Search APIs and store them in a CouchDB database.)

Consider the following Tweet:

This is a status update I sent when I was walking around my neighborhood in Wilmington, Delaware thinking about how Twitter could be used to start a 311 service request. If I wanted to get this status update in JSON format from the Twitter API, I would use the statuses/show method with the id of the status update.

This URL will return the full JSON object for the status update. So now we can start to see how this JSON structure can be saved into a CouchDB database.

First, create a CouchDB database to use for our example:

curl -X PUT http://127.0.0.1:5984/twittertest
{"ok":true}

Consider the following sample script (written in PHP):

This is a very simplistic example of how you can interact with the Twitter API and store JSON formatted status updated to CouchDB. Do note, you’ll need to use PHP version 5.2.0 or greater to take advantage of PHP’s JSON functions. Not counting our constant declarations and basic cURL functions, it takes 3 lines of code to grab a status update and store it in CouchDB. When you run this script from the command line, you’ll see something like this:

curl http://127.0.0.1/twitter-example.php?status_id=7911766753
{"ok":true,"id":"7911766753","rev":"3-8855db513f6fc0d45bbbc66a6be02035"}

It may seem redundant to get the ID of the Twitter status update from the JSON object returned from the API (since we’re already using this ID to get the status update in the first place), but consider a scenario where you are interacting with the Twitter REST API, to get @mentions or to search for any status update the contains specific phrases. Those interactions could potentially return dozens or hundreds of Tweets at a time as a collection of JSON formatted objects. By wrapping our simple example logic in a foreach() statement, you can easily process large volumes of status requests without having the ID in advance.

Ch-ch-ch-ch-Changes

Now that you’ve got a handle on just how easy it is to grab Twitter status updates and store them in CouchDB, you’re probably wondering what to do next in building your Twitter app. Let’s say your Twitter app looks for @mentions on a specific account and then processes those status updates based on the content of the Tweet, or by looking at the location of the Tweet (this is, by the way, what TweetMy311 does).

You’ll probably need to go through a series of steps, processing a status update at each step and ultimately sending a response back to a user via the Twitter API. Get a Tweet — process it n number of times —  send back a response. The series of steps involved in processing a Tweet (one step to potentially many steps, depending on what your application does) is the heart of any Twitter application, and can cause lots of complexity if not approached properly.

Fortunately, CouchDB makes this aspect of building Twitter applications incredibly easy to do through the _changes API. The _changes API is a powerful, easy to use mechanism for receiving notices when a change has been made to your CouchDB database. Each time a new document is inserted into a CouchDB database, or an existing document is updated, the _changes API will provide notice. The book “CouchDB: The Definitive Guide” has an entire section on the _changes API to goes into great detail about all the ways it can be used.

For the purposes of this post, let’s focus on the continuous changes API. This API lets you set up a single, persistent HTTP connection to CouchDB and get notices each and every time a document is inserted or updated. You can access the continues changes API by simply running the following at the command line:

curl -X GET "http://127.0.0.1:5984/twittertest/_changes?feed=continuous"

If you run this command in a separate terminal window you’ll see the following:

{"seq":1,"id":"7911766753","changes":[{"rev":"1-daac85c534df91d607ea8a7aa8f46fb2"}]}

This is the first change (in sequence order) that has happened in the specified CouchDB database since it was created.

There are lots of options you can use to help you refine how the HTTP connection to the _changes API works, and which specific changes that your Twitter application acts on. If the HTTP client you’re using is picky about how long it will keep a connection open without getting a response, you can use the “heartbeat” parameter to tell CouchDB to send a newline character at specific intervals, to tell your HTTP client that the connection is still alive.

You can also use the “since” parameter to specify the change sequence you want to act on. For example, if you the following example in your terminal, you won’t see anything (yet) because there has been only one change to the database:

curl -X GET "http://127.0.0.1:5984/twittertest/_changes?feed=continuous&since=2"

Even more powerful, you can specify filters and apply them to the _changes API to access only the changes that meet specific tests. Filters live inside design documents and can be applied by using the “filter” parameter. For example, consider a Twitter application that processes incoming @mentions stored in a CouchDB database through a series of steps.

Let’s say that there is something specific that needs to happen on the third step of processing Tweets your application (e.g., sending a response back to the user through the Twitter API). We want to use the _changes API to tease out any change that are at step 3. We can do this by looking at the revision number of in the _rev field of each document:

{
"_id": "_design/process_tweet",
"filters": {
"step_3": "function(doc, req) { if(doc._rev.charAt(0) == '2') {
return true;
} else {
return false;
}}"
}
}

When we define our filter like this, we can now access the _changes API to ensure we only access changes to status messages that have reached the third step in processing.

To test this filter, you can open up Futon and access the twittertest database. The Twitter status update we inserted earlier in this post should have a revision starting with a “1”. When you access that document in Futon and save it (thereby changing its revision to begin with a “2”) you will see it come through in the HTTP connection we have set up to the _changes API that uses our custom process_tweet filter. For example

curl -X GET "http://127.0.0.1:5984/twittertest/_changes?feed=continuous&filter=process_tweet/step_3"
{"seq":3,"id":"7911766753","changes":[{"rev":"2-0e701a74cb89c65853fc25dbff152364"}]}

Towards Easier Twitter Apps

As you can see, there are lots of reasons why CouchDB is ideal for building Twitter applications. It’s easy to use, incredibly powerful and has lots of cool built in features that can help Twitter application developers.

The primary challenge I faced in building TweetMy311 was time —  I was the only developer on the project and my time to get it up and running was very limited. I didn’t want this constraint to diminish what the application could do, and I also wanted to make my approach repeatable - so I can build more cool Twitter apps in the future. CouchDB was the perfect choice for my project, and I’m proud to use it on a project that will soon help lots of people make their communities better.

The life of a Twitter developer can be challenging on the best of days —  make it easier, with CouchDB.