How To Use Mongodb
10 Feb 2013One of the really nice things (for me at any rate) about MongoDB is
that I can structure my database records the same way that I structure
my core data objects when hacking Clojure. Clojure’s representation of
choice when it comes to datastructures is to use what I call
“structured maps”, and which the rest of the Clojure community seems
to refer to simply as “maps” and yet uses as Objects. Python,
Javascript, Ruby and many other languages take the approach to objects
that the .
semantics really constitute a map rather than an array
reference. As objects may extend multiple types rather than actually
storing (or computing in some platform dependent way) a translation
that .bar
is equal to +0x50
in pointer arithmetic, just have
Object.a == Object['a']
. This makes objects in general a breeze as
they are simply fancy maps that happen to have functions pointed to by
keys.
Clojure just throws that right out too, taking the approach that
rather than treat a map as an object with magical keys, just flat out
use a map and write functions which manipulate such maps. This design
philosophy leads to systems such as my out of order processor where
the fundamental data model is a map about who’s keys assumptions are
made and for which transformation and “state change” functions are
created. As MongoDB stores JSON maps (Okay fine BSON maps), this makes
building an idiomatic internal representation and then flat
serializing/deserializing it silently with database reads and writes
amazingly painless. However it introduces one really bad design flaw
to which I confess this blog falls victim: _id
key manipulation.
MongoDB is a multi-keyed datastore, in that it can do BST/BTree
indexing of objects by several keys however the _id
key is “magic”
in that they are all unique, and immutable. Now as a Clojure
programmer, I’m right at home with immutable. The entire Clojure
language (at least until you hit the underlying Java implementation)
is built upon and for immutability of objects. However
Clojure’s immutability is flexible: “object” A can be used to create
some object A’, constituting a “modification” of the object A yet
which does not side-effect the object A. In Mongo however, once a
record is created, the _id
key cannot be changed. This makes
assigning the _id
key by hand a really bad idea
period. Anything you could do by f*cking with the _id
key
could also be accomplished by sorting on the id
key (id
!= _id
),
and you would be able to (re)order your records whenever you
feel like it! This blog (as of this post at any rate), does the wrong
thing here and sets the _id
key by hand to poor effects.
Intuitively, it is safe to sequentially number posts. Posts are almost
events in that they become visible at some time, and consequently
should be displayed in order as the news-feed design model
dictates. This is great… if you can somehow keep posts strictly in
order within the database. One way to do this is to index posts based
on when they were “posted”, as that instant constitutes a Long
value
which is unique to millisecond resolution. More than unique enough for
this blog which has post to post latencies in the weeks. However then
I have to deal with the Long
overflow issue in 2048 and more
importantly when I hacked this blog together the equivalent of
getTimeMillis
was having a bad day and didn’t work for me. So I just
have a record in my Mongo instance which has the glorious duty of
retaining the _id
to be assigned the next generated post (or draft)
counting sequentially by 1 from 0. Now this makes sense… until I use
that value to set the _id
key and generate drafts and posts out of
order.
- Create draft "A" - ID 0 - Time 0
- Create draft "B" - ID 1 - Time 1
- Create draft "c" - ID 2 - Time 2
- traffic I don't care about...
- Post draft "A" - ID 0 - Time 3
- Post draft "C" - ID 2 - Time 4
- Post draft "B" - ID 1 - Time 5
Now when I go to view my frontpage