牛: the environment model

27 Oct 2014

Disclaimer: Oxlang is vaporware. It may exist some day, there is some code, however it is just my thought experiment at polishing out some aspects of Clojure I consider warts by starting from a tabula rasa. The following represents a mostly baked scheme for implementing ns and require more nicely in static (CLJS, Oxcart) rather than dynamic (Clojure) contexts.

Unlike Clojure in which the unit of compilation is a single form, Oxlang’s unit of compilation is that of a “namespace”, or a single file. Oxlang namespaces are roughly equivalent to Haskell modules in that they are a comprised of a “header”, followed by a sequence of declarative body forms.

In Clojure, the ns form serves to imperatively create and initialize a namespace and binding scope. This is done by constructing a new anonymous function, using it as a class loader context to perform cached compilation of depended namespaces. Subsequent forms are compiled as they occur and the results are accumulated as globally visible defs.

Recompiling or reloading a file does exactly that. The ns form is re-executed, incurring more side-effects, and all forms in the file are re-evaluated generating more defs. However this does not discard the old defs from the same file, nor purge the existing aliases and refers in the reloaded namespace. This can lead to interesting bugs where changes in imports and defs create name conflicts with the previous imports and cause reloading to fail. The failure to invalidate deleted defs also creates conditions where for instance during refactorings the old name for a function remains interred and accessible the program run time allowing evaluation of code which depends on the old name to succeed until the entire program is reloaded in a fresh run time at which point the missing name will become evident as a dependency fault.

Furthermore, the Var mechanism serves to enable extremely cheap code reloading because all bindings are dynamically resolved anyway. This means that there is exactly zero recompilation cost to new code beyond compilation of the new code itself since the Var look up operation is performed at invoke time rather than at assemble time.

Unfortunately in my Clojure development experience, the persistence of deleted symbols resulted in more broken builds than I care to admit. Building and maintaining a dependency graph between symbols is computationally inexpensive, is a key part of many language level analyses for program optimization and here critically provides better assurance that REPL development behavior is identical to program behavior in a cold program boot context.

In order to combat these issues, two changes must be made. First, re-evaluating a ns form must yield a “fresh” environment that cannot be tainted by previous imports and bindings. This resolves the import naming conflict issues by making them impossible. By modeling a “namespace” as a concrete “module” value having dependencies, public functions and private functions we can mirror the imperative semantics enabled by Clojure’s defs and Vars simply by accumulating “definitions” into the “module” as they are compiled.

This model isn’t a total gain however due to the second change, that reloading entirely (and deliberately) invalidates the previous definitions of every symbol in the reloaded namespace by swapping out the old namespace definition for the new one. This implies that other namespaces/modules which depend on a reloaded module must themselves be reloaded in topological sort order once the new dependencies are ready requiring dependency tracking and reloading infrastructure far beyond Clojure’s (none). Naively this must take place on a file by file basis as in Scala, however by tracking file change time stamps of source files and the hash codes of individual def forms a reloading environment can prove at little cost that no semantic change has taken place and incur the minimum change cost. I note here the effectiveness of GHCI at enabling interactive development under equivalent per-file reloading conditions as evidence that this model is in fact viable for enabling the interactive work flow that we associate with Clojure development.

With “namespaces” represented as concrete immutable values, we can now define namespace manipulation operations such as require and def in terms of functions which update the “current” namespace as a first class value. A def when evaluated simply takes a namespace and returns a new namespace that “happens” to contain a new def. However the work performed is potentially arbitrary. refer, the linking part of require, can now be implemented as a function which takes some enumeration of the symbols in some other namespace and the “current” environment, then returns a “new” environment representing the “current” environment with the appropriate aliases installed.

This becomes interesting because it means that the return value of load need lot be the eval result of the last form in the target file, it can instead be the namespace value representing the final state of the loaded module. Now, given a caching/memoized load (which is require), we can talk about an “egalitarian” loading system where user defined loading paths are possible because refer only needs the “current” namespace, a “source” namespace and a spec. Any function could generate a “namespace” value, including one which happens to perform loading of an arbitrary file as computed by the user. See technomancy’s egalitarian ns for enabling the hosting of multiple versions of a single lib simultaneously in a single Clojure instance is one possible application of this behavior.

It is my hope that by taking this approach the implementation of namespaces and code loading can be simplified greatly however one advantage of the Var structure is that it enables forwards and out of order declarations which is immensely useful while bootstrapping a language run time ex nihilo, as done here in the Clojure core. ns itself must exist in the “empty” namespace, otherwise as the “empty” namespace is used to analyze the first form in a file (stateless (abstractly) compiler ftw) the ns form itself will not be resolved and no program can be constructed. Ox could follow Clojure’s lead and cheat by not implementing ns in Ox but rather bootstrapping it from Java or Clojure or whatever the implementing language turns out to be but I’d like to do better than that. This is a problem I haven’t solved yet.