How To Use Mongodb

10 Feb 2013

One of the really nice things (for me at any rate) about MongoDB is that I can structure my database records the same way that I structure my core data objects when hacking Clojure. Clojure’s representation of choice when it comes to datastructures is to use what I call “structured maps”, and which the rest of the Clojure community seems to refer to simply as “maps” and yet uses as Objects. Python, Javascript, Ruby and many other languages take the approach to objects that the . semantics really constitute a map rather than an array reference. As objects may extend multiple types rather than actually storing (or computing in some platform dependent way) a translation that .bar is equal to +0x50 in pointer arithmetic, just have Object.a == Object['a']. This makes objects in general a breeze as they are simply fancy maps that happen to have functions pointed to by keys.

Clojure just throws that right out too, taking the approach that rather than treat a map as an object with magical keys, just flat out use a map and write functions which manipulate such maps. This design philosophy leads to systems such as my out of order processor where the fundamental data model is a map about who’s keys assumptions are made and for which transformation and “state change” functions are created. As MongoDB stores JSON maps (Okay fine BSON maps), this makes building an idiomatic internal representation and then flat serializing/deserializing it silently with database reads and writes amazingly painless. However it introduces one really bad design flaw to which I confess this blog falls victim: _id key manipulation.

MongoDB is a multi-keyed datastore, in that it can do BST/BTree indexing of objects by several keys however the _id key is “magic” in that they are all unique, and immutable. Now as a Clojure programmer, I’m right at home with immutable. The entire Clojure language (at least until you hit the underlying Java implementation) is built upon and for immutability of objects. However Clojure’s immutability is flexible: “object” A can be used to create some object A’, constituting a “modification” of the object A yet which does not side-effect the object A. In Mongo however, once a record is created, the _id key cannot be changed. This makes assigning the _id key by hand a really bad idea period. Anything you could do by f*cking with the _id key could also be accomplished by sorting on the id key (id != _id), and you would be able to (re)order your records whenever you feel like it! This blog (as of this post at any rate), does the wrong thing here and sets the _id key by hand to poor effects.

Intuitively, it is safe to sequentially number posts. Posts are almost events in that they become visible at some time, and consequently should be displayed in order as the news-feed design model dictates. This is great… if you can somehow keep posts strictly in order within the database. One way to do this is to index posts based on when they were “posted”, as that instant constitutes a Long value which is unique to millisecond resolution. More than unique enough for this blog which has post to post latencies in the weeks. However then I have to deal with the Long overflow issue in 2048 and more importantly when I hacked this blog together the equivalent of getTimeMillis was having a bad day and didn’t work for me. So I just have a record in my Mongo instance which has the glorious duty of retaining the _id to be assigned the next generated post (or draft) counting sequentially by 1 from 0. Now this makes sense… until I use that value to set the _id key and generate drafts and posts out of order.

Create draft "A" - ID 0 - Time 0
Create draft "B" - ID 1 - Time 1
Create draft "c" - ID 2 - Time 2
traffic I don't care about...
Post draft "A" - ID 0 - Time 3
Post draft "C" - ID 2 - Time 4
Post draft "B" - ID 1 - Time 5

Now when I go to view my frontpage