Of Oxen, Carts and Ordering

06 Aug 2014

Well, the end of Google Summer of Code is in sight and it’s long past time for me to make a report on the state of my Oxcart Clojure compiler project. When last I wrote about it, Oxcart was an analysis suite and a work proposal for an emitter.

To recap the previous post: Clojure uses Var objects, an atomic, threadsafe reference type with support for naming and namespace binding semantics to create a literal global hashmap from symbols to binding vars, with a literal stack of thread local bindings. These vars form the fundamental indirection mechanism by which Clojure programs map from textual symbols to runtime functions. Being atomically mutable, they also serve to enable dynamic re-binding and consequently enable REPL driven development.

But for this second aspect of providing for dynamic redefinition of symbols, Clojure could be statically compiled eliminating var indirection and achieving a performance improvement.

Moreover, in the style of Clojurescript, exposing the full source of the language to an agressive static compiler could yield total program size improvements in comparison to programs running on the official Clojure compiler/runtime pair.

So. This was the premise upon which my Project Oxcart GSoC began. Now, standing near the end of GSoC what all has happened, where does the project stand and what do I consider results?

As of this post’s writing, e7a22a09 is the current state of the Oxcart project. The Oxcart loader and analyzer, built atop Nicola Mometto’s tools.analyzer.jvm and tools.emitter.jvm, is capable of loading and generating a whole program AST for arbitrary Clojure progrms. The various passes in the oxcart.passes subsystem implement a variety of per-form and whole program traversals including λ lifting, use analysis, reach analysis and taken as value analysis. There is also some work on a multiple arity function reduction system, but that seems to be a problem module at present. The oxcart.emitter subsystem currently features two emitters, only one of which I can claim is of my authoring. oxcart.emitter.clj is a function from an Oxcart whole program AST to a (do) form containing source code for the entire program as loaded. This has primarily been a tool for debugging and ensuring the sanity of various program transformations.

The meat of Oxcart is oxcart.emitter.jvm, a wrapper around a modified version of Clojure.tools.jvm.emit which features changes for emitting statically specialized bytecode which doesn’t make use of var indirection. These changes are new, untested and subject to change but they seem to work. As evidenced by the bench-vars.sh script in the Oxcart project’s root directory, for some programs the static target linking transform done by Oxcart can achieve a 24% speedup.

Times in ms.
Running Clojure 1.6.0 compiled test.vars....
1603.894357
1369.60414
1348.747718
1345.55669
1343.380462
1341.73058
1346.342237
1345.655224
1395.983262
1340.063011
1339.7823
1399.282023
1451.58462
1353.231654
1348.458151
Oxcart compiling test.vars....
Running Oxcart compiled test.vars....
1256.391888
1066.860551
1033.764971
1031.380052
1032.09911
1030.032666
1040.505754
1040.419533
1041.749353
1040.038301
1041.699772
1044.441135
1095.809551
1072.494466
1047.65782

How/why is this possible? The benchmark above is highly artificial in that it takes 500 defs, and tests over several thousand iterations of selectively executing half of them at random. This benchmark is designed to exaggerate the runtime cost of var indirection by being inefficient to inline and making great use of the (comparatively) expensive var dereference operation.

So what does this mean for Clojure? Is Oxcart proof that we can and should build a faster Clojure implementation or is there a grander result here we should consider?

Clojure’s existing compiler operates on a “good enough” principle. It’s not especially nice to read nor especially intelligent, but it manages to produce reasonably efficient bytecode. The most important detail of the reference Clojure compiler is that it operates on a form by form basis and is designed to do so. This is an often forgotten detail of Clojure, and one which this project has made me come to appreciate a lot more.

When is static compilation appropriate and valuable? Compilation is fundamentally an analysis and specialization operation designed to make a trade off between “start” or “compile” time and complexity and runtime performance. This suggests that in different contexts different trade offs may be appropriate. Furthermore, static compilation tends to inhibit program change. To take the extreme example of say C code which is directly linked change in a single function, if a single function increases in bytecode size so that it cannot be updated in place. In this case all code which makes use of it (worst case the rest of the program) must be rewritten to reflect the changed location of the changed function. While it is possible to build selective recompilation systems which can do intelligent and selective rebuilding (GHCI being an example of this), achieving full compilation performance at interactive development time is simply a waste of time on compilation when the goal of REPL driven development is to provide rapid feedback and enable exploration driven development and problem solving.

While vars are clearly inefficient from a perspective of minimizing function call costs, they are arguably optimal in terms of enabling this development style. Consider the change impact of altering a single definition on a Clojure instance using var indirection rather than static linking. There’s no need to compute, recompile and reload the subset of your program impacted by this single change. Rather the single changed form is recomputed, and the var(s) bound by the recompiled expression are altered. In doing so, no changes need be made to the live state of the rest of the program. When next client code is invoked the JVM’s fast path if any will be invalidated as the call target has changed, but this is handled silently by the runtime rather than being a language implementation detail. Furthermore var indirection means that compiling an arbitrary Clojure form is trivial. After handling all form local bindings (let forms and soforth), try to resolve the remaining expressions to vars in the mapped namespace, and to a class if no var is found. Brain dead even, but sufficient and highly effective in spite of its lack of sophistication.

While, as Oxcart is proof, it is possible to build a static compiler for some subset of Clojure, doing so produces not a static Clojure but a different language entirely because so much of the Clojure ecosystem and standard library even are defined in terms of dynamic redefinition, rebinding and dispatch. Consider for instance the multimethod system, often lauded with the claim that multimethod code is user extensible. Inspecting the macorexpand of a defmethod form you discover that its implementation far from being declarative is based on altering the root binding of the multimethod to install a new dispatch value. While it would be possible to statically compile this “feature” by simply collecting all defmethods over each multimethod and AOT computing the final dispatch table, however this is semantics breaking as it is strictly legal in Clojure to dynamically define additional multimethod entries just as it is legal to have an arbitrary do block containing defs.

So, in short, by trying to do serious static compilation you largely sacrifice REPL development and to do static compilation of Clojure you wind up defining another language which is declarative on defs, namespaces and dependencies rather than imperative as Clojure is. I happen to think that such a near-Clojure static language, call it Clojurescript on the JVM, would be very interesting and valuable but Clojure as implemented currently on the JVM is no such thing. This leads me to the relatively inescapable conclusion that building a static compiler for Clojure is putting the cart before the ox since the language simply was not designed to benefit from it.

Moving Forwards

Now. Does this mean we should write off tools.emitter.jvm, Clojure in Clojure and Oxcart? By no means. tools.emitter.jvm uses the same var binding structure and function interface that Clojure does. Moreover it’s a much nicer and more flexible single form level compiler that I happen to think represents a viable and more transparent replacement for Clojure.lang.Compiler.

So what’s that leave on the table? Open question. Clojure’s compiler has only taken major changes from two men: Rich and Stu. While there is merit in considered design and stability, this also means that effort towards cleaning up the core of Clojure itself and not directed at a hard fork or redesign of Clojure is probably wasted especially in the compiler.

Clearly while var elimination was a successful optimization due to the value Clojure derives from the Var system it’s not a generally applicable one. However it looks like Rich has dug the old invokeStatic code back out for Clojure 1.7 and the grapevine is making 10% noises, which is on par with what Oxcart seems to get for more reasonable inputs so we’ll see where that goes.

While immutable datastructures are an awesome abstraction, Intel, AMD and ARM have gotten very good at building machines capable of exploiting program locality for performance and this property is fundamentally incompatible with literal immutability. Transients can help mitigate Clojure’s memory woes, and compiler introduction of transients to improve memory performance could be interesting. Unfortunately again this is a whole program optimization which clashes with the previous statement of the power that Clojure derives from single form compilation.

Using core.typed to push type signatures down to the bytecode level would be interesting, except that since almost everything in Clojure is a boxed object and object checkcasts are cheap and this would probably result in little performance improvement unless the core datatypes were reworked to be type parametric. Also another whole program level transform requiring an Oxcart style whole program analysis system.

The most likely avenue of work is that I’ll start playing with a partial fork of Clojure which disassociates RT from the compiler, data_readers.clj, user.clj and Clojure/core.clj. While this will render Oxcart even more incompatible with core Clojure, it will also free Oxcart to emit Clojure.core as if it were any other normal Clojure code including tree shaking and escape the load time overhead of bootstrapping Clojure core entirely.

This coming Monday I’ll be giving a talk on Oxcart at the Austin TX Clojure meetup, so there’s definitely still more to come here regardless of the Clojure/Conj 14 accepted talk results.