A Better VM

For the last couple of years I've been working with Clojure, a lisp which runs on top of the JVM. My reservations with Clojure itself, and Clojure's maintainership are at this point fairly well established. However I'd be lying if I said that after thinking long and hard about the way I want to develop software I've come up with anything incrementally achievable and better. Clojure's syntax is convenient. Its datastructures are clever. Its immutable defaults are sane with respect to any other language. Its integration with the JVM while fatal to its own semantics ensure unmatched leverage. In short, I don't know if it's possible to do better atop the JVM.

But why use the JVM?

The JVM itself is a wonder of engineering, compiler technology and language research. But the JVM standard library is deeply mutable and makes assumptions about the way that we can and should program that aren't true anymore. While the JVM itself may be awesome, I'm just not convinced that the object/class library it comes with is something you'd actually want around as a language designer. Especially as the designer of a functionally oriented language, or a language with a different typing/dispatch model than Java's.

The conclusion I personally came to was that, faced with what already exists on the JVM I couldn't muster the wherewithall to work in that brownfield. I needed a greenfield to work with. Somewhere I could explore, have fun exploring and make mistakes.


I honestly don't remember why I chose this name, but it's stuck in my head and adorns the git repo where all of this lives. Dirt isn't something I'm gonna be releasing on github, although you can totally browse the code on git.arrdem.com. Simply, dirt isnt's intended for anyone else to use or contribute to now or in the foreseeable future. It's my experiment, and I think that my total control over the project is important to, well, finishing it sometime.

So what's the goal?

Versioning is something I think is really important both at the package and compilation unit level. I previously wrote about artifact versioning, and experimented with versioned namespaces in my fork of Clojure. Unfortunately, as a user experience versioned namespaces didn't pan out. There were too many corner cases, and too many ways that partial recompilation could occur and generate disruptive, useless warnings. So versioning is one major factor in Dirt's design.

Another is functional programming. After working with the JVM, I'm pretty much convinced that design by inheritance is just flat out wrong. While I was working with Dr Perry, he shared an awesome paper with me: Object-Oriented programs and Testing. The point of the paper essentially is that testing inheritance structured programs adequately is really hard and everybody does it wrong. Testing mutable programs is already hard enough, and single inheritance poses so many design difficulties that it just doesn't seem worthwhile. In my time using Clojure, I found that I never actually needed inheritance. Interfaces satisfied, and functions which operated against interfaces were better captured as functions in namespaces than as inherited static functions packaged away in some base class or helper package.

The failure mode of interfaces as presented by Java however is that they are closed. Only the implementer of a type can make it participate in an interface. This is far too restrictive. Haskell style typeclasses get much closer to my idea of what interfaces should look like, in terms of being open to extension/implementation over other types and carrying contracts.

Which brings me to my final goal: contracts or at least annotations. I like types. I like static dispatch. While I can work in dynlangs, I find that the instant my code stops being monomorphic at least with respect to some existential type constructed of interfaces/protocols I start tearing my hair out. Types are great for dispatch, but there's lots and lots of other static metadata about programs which one could potentially do dataflow analysis with. For all that I think Shen is batshit crazy and useless, the fact that it provides a user extensible type system which can express annotations is super interesting. Basically I think that if programmers are given better-than-Clojure program metadata, and tools for interacting with program metadata that it would be possible to push back against @Jonathan_Blow's excellent observation that comments and documentation always go stale and become worthless.

So what's the architecture?

DirtVM so far is just a collection of garbage collectors and logging interfaces I've built. All of the rest of this is hot air I hope to get to some day.

The fundamental data architecture is something like this:

    Group, Package
    Terms: (GPL1 | GPL2 | EPL | MIT | Other)
    (bag of attributes)
    List[(Name, Version)]

  Imports: List[Import]
    Types, Ifaces, Fns

  Namespace, Alias

The essential idea is that because versioning is so hard, it's easier to fix the runtime to allow co-hosting of multiple versions of artifacts than to somehow try and solve the many many difficulties of software versioning and artifact development. Java 9 modules look really really good and come close to being an appropriate solution, but the Java team have abandoned the idea of versioned modules. In Dirt when code is compiled within a namespace it has access only to what has been explicitly imported by that namespace. Imports are restricted to the contents of the module and the module's direct dependencies. It is not possible for a namespace to import a transitively depended module. This means that at all times a user is in direct control of what version of a function or a type they are interacting with. There is no uncertainty.

This gets a little sticky for datastructures. If module A depends directly on B and B depends directly on C, it's possible that B will return into A a data structure, function or closure which comes from the module C. This turns out to work fine. Within a single module, protocol/interface dispatch is done only against the implementation(s) visible in that module scope. Because A has no knowledge at compile time of any such type from C, it can't do anything with such a type except hand it back to B which can use it.

Types and interfaces are very haskell style. Mutability will be supported, but avoided in the standard library wherever possible. Interfaces will be typeclass style pattern matching dispatch, not call target determined. This makes real contracts like Eq possible and extensible rather than being totally insane like non-commutative object equality. Types are just records, and will be internally named and distinct by package, version namespace and name. This makes it possible to have multiple consistent implementations of the same interface against versions of the same type co-hosted. Much in the Haskell style, the hope is that for the most part interface dispatch can be statically resolved.

Why record types instead of a Lua or Python style dynamic object dispatch system? Because after working with Clojure for a while now it's become clear to me that whatever advantages dynamic typing may offer in the small are entirely outweighed by static typing's advantages in the large, and that packaging functions into objects and out of namespaces buys you nothing. While dynamic typing and instance dispatch can enable open dispatch they also defeat case based reasoning when reliability is required. Frankly my most reliable Clojure code would have translated directly to Haskell or Ocaml. Refactoring matters especially as projects grow. Being able to induct someone else to your code matters. Being able to produce meaningful errors someone understand and can trace to a root cause requires information equivalent to types. Dynamic typing just obscures static contracts and enables violations to inadvertently occur, leaning on exhaustive test coverage. Dynamic typing introduces concrete runtime costs, and slows down program evolution because building tools is simply harder. Tools matter, so static typing ho.

In addition to interfaces/typeclasses, there are also fns (fn in Clojure) which are statically typed, non-extensible, single arity procedures. Despite impurity, the term function is used for these in keeping with industry convention.

The namespaces are very much Clojure style, because I've been really happy with the way that Clojure's namespaces work out for the most part and I want to support a language which isn't at least syntactically and in the namespace system that distant from Clojure. Import renaming is awful, but qualified imports are fine hence why imports support aliases.

The ultimate goal of this project is to be able to present a virtual machine interface which is itself versioned. Imagine if you could write software which used dependencies themselves targeting incompatible/old versions of the standard library! That would solve the whole problem of language and library evolution being held to the lowest common denominator.

Dirt itself will be a garbage collected, mostly statically typed bytecode VM much like the JVM. Probably gonna get a ssa/dataflow bytecode level representation rather than a stack machine structure. But that level of detail I'll figure out when I get to it. For now I'm having fun with writing C and garbage collectors. The next step will probably be some pre-dirt language to help me generate C effectively.

Here's to log cabins and projects of passion!


(more frustration)

Ferd T-H was kind enough to perfectly voice one of my long standing frustrations with Clojure and it feels like many small programming communities and I couldn't resist sharing.

A community of people fine with inadequate tooling/docs/attitude/etc overlooks those who avoided it for that reason. Survivorship bias? Then again, reframe the question as a trial by fire and dealing with less than ideal conditions becomes a badge of honor. Which in turns possibly only breeds more tolerance for entrenched inadequacy. You need fresh eyes to point out your slanted perspective.

Via link

My criticism of any "design for experts" philosophy may be inferred.


Immutable Env Things

As with some of my other posts, this was originally an email which turned into enough of a screed I thought it would be of general interest and worth posting. Some op/ed material has been removed, but the technical details are unaltered if not clarified.

So here's the rundown I've got so far on a lisp with immutable environments. I'm merely gonna sketch at some stuff, because giving a full treatment would require dusting off and finishing a bunch of stuff I came up with for Ox and put down again.

I also threw out the Haskell sketch I was dinking with because it was just distracting from writing this :P

So lets start from the top.

Clojure is a form-at-a-time language which happens to support reading files of forms (or rather Readers which produce textual representations of forms, where they come from is magical).

Binding is achieved through a global, thread shared, transactional, mutable mapping from Symbols to Namespaces, which are themselves transactional mappings from Symbols to Vars, which are themselves a triple (QualifiedSymbol, Metadata, Binding). Vars happen to also support dynamic binding (pushing and popping), but this is less used. From here on out I'll treat them simply as a global mostly static binding mechanism, which is their primary use case anyway.

Control and local bindings are achieved entirely using lambda/fn forms compiled to JVM methods, produced by the opaque compiler as opaque IFn/AFn objects. Top level forms are compiled and executed for effect by being wrapped in an anonymous (fn [] <form>) which can be blindly invoked by the compiler to realize whatever effects the form may have.

Circular dependencies are achieved via Var indirection. A Var is first created with no binding, so that it can be referenced. With that environment binding, code which will depend on (call) the Var's final bound value can be compiled since it just needs the target Var to exist in order to compile. The depended on Var may then be redefined, having both itself and its dependent visible in the compilation context and so the two Var bindings can be mutually recursive.

While traditional and lispy, this approach has a number of problems which I'm sure I don't need to reiterate here. The goals of an immutable namespace system then should be to

  1. Make Namespaces the compilation unit rather than Forms
  2. Make Namespaces reloadable (change imports/exports/aliases w/o system reboot)
  3. Enable Namespace compilation to fail sanely
  4. Enable refactoring/analysis

The sketch of this I've been playing with is as follows:

Lets assume an interpreter. If you can interpret you can compile as an implementation detail of eval and interpretation is easier to implement initially. Assume a reader. So we read a form, then we have to evaluate it. Evaluation is first macroexpansion, then sequential evaluation of each resulting form in the environment. Evaluation at the top level is eval Env Form -> Env (we discard the result) which is fine. Because we aren't really doing compilation, only interpretation, Clojure's tactic of having Vars and analyzing against Var bindings works 1:1. Create an unbound symbol, analyze/bind against it, then add a binding later.

The (global) environment structure I had in mind is pretty simple: one unqualified symbol names the current namespace (*ns*), a mapping exists from unqualified symbols to Namespaces.

Namespaces are essentially compilation contexts, and really just serve to immutably store the alias mapping, import mapping, the mapping of the Vars in the namespace to bindings, and a list of the public Vars/exports from the namespace.

Vars are a fully qualified name, metadata and a value. That's it.

Compilation occurs at the namespace level. All the forms in a namespace are sequentially read and evaluated in interpretation mode. All macros expand normally and type hints are processed they just don't do anything.

Once the whole namespace has been successfully loaded in interpretation mode, a second pass may be made in which static linking is performed. Because everything in the namespace has been fully realized already it's possible to statically link even mutually recursive calls within the same namespace. Full type inference within the namespace is also possible here, etc.

Aside: Interestingly because it is single pass, the existing Clojure compiler doesn't (can't) handle hinted mutually recursive function calls at all. It relies on Var metadata to store arglists (and hints for primitive invocations), so in order to emit a hinted recursive call the forward declaration of the var has to carry the same ^:arglists metadata (hints and all!) which will be added by defn whenever it is finally defined.

Once a namespace has been fully loaded and compiled (including the second pass) the global environment with the resulting namespace and var mappings is the result of whatever caused loading. If an error occurs during compilation, we just abort and return the environment state before anything else happened. I think there are tricks to be played with generated garbage class names and classloader isolation here so that this works even during the 2nd static compile pass, but it should be possible to fully clean up a failed compile.

So this works really great for module loading, and even for macros which expand into multiple def forms. This model of eval :: env -> form -> (env, result) starts to break down when you want to talk about macros that do evaluation during code loading, and likewise the REPL. Basically your interpreter winds up holding an additional reference, *env* or something, which is intrinsically mutable, but which references an immutable environment and holds shall we say the state continuation of the entire REPL. Consider user code which actually calls eval during execution. Whatever state changes that evaluation creates need to be preserved in the "global" continuation.

In this model when you compile a form at the REPL, the runtime could obviously interpret (this may be appropriate for macroexpansion) and can also compile. This is only tricky when the user recompiles a symbol which was already bound/compiled. In this case, the runtime could either eagerly recompile all the code which links against this function using the same interpret then statically link tactic or could just invalidate any compiled bytecodes and lazily recompile later. The former is probably more predictable.

Once a namespace has been reloaded/altered, all namespaces importing it must also be recompiled in topsort order using the same tactics. That we already have per-symbol dependency information and per-module dependency information helps with building refactoring tools which otherwise have to derive all this themselves. Ideally top level expressions/definitions would also expose local variable tables tables and local variable dataflow graphs so that analysis there is also possible.

Aside: It may be possible to allow cyclic dependencies between namespaces (see submodules in racket for some study on this problem). In the common case it may well be that macros are well behaved and that cyclic dependencies between modules work out just fine. However because macros perform arbitrary computation it's also pretty easy to cook up pathological cases where macro generated code never reaches a fixed point in repeated interpretive analysis which can be statically compiled. For this reason I'm inclined towards formalizing a ns or module special form and throwing out require, refer, import and friends as legal top level forms altogether. Namespaces should be declarative as in Java/Haskell.

There's probably a way which I just haven't seen yet to treat updates to the "global" namespace mapping and updates within a namespace the same way via the same mechanism since they're both dependent on the same dependency tracking and recompile machinery.

Writing this has made me think about having an ImmutableClassLoader which is derived entirely from a immutable environment and loads classes by lazily (statically) compiling forms.

That's kinda all I got. ztellman gets credit for the mutable *env* in the repl bit which I spent literally weeks trying to do without last year. Maybe something with monad transformers can get you there but I couldn't figure it out. mikera has some sketches of all this with types in his KISS project.

I've already prototyped an environment system that looks a lot like this in the Ox repo if you want to go dig. Originally there was this ox/lang/environment.clj then I started dinking with a Java implementation of the environment structure ox/lang/environment which also has a couple different stack based contextual binding types for the interpreter.

As written about in the Jaunt essays, I've kinda concluded that whatever this immutable ns language winds up looking like is so divorced from what Clojure or at least the way that most people write Clojure that you'll loose so much value in the changeover it won't pay off for a very long time. The mount library is a perfect example of this in that it's deeply imperative and takes advantage of code loading order to achieve start and stop. It's not the only bit of code I've seen which does effects at load time either. Hence Jaunt which tries to be less flawed around namespace reloading/refactoring without going whole hog on ns immutability.

The more I kick this stuff around the more I find that the whole endeavor verges on a Haskell clone in sexprs and the more I decide I'd rather take the time to learn Haskell properly than mess with an immutable lisp or Jaunt. Besides then you get nice stuff like typeclasses and fully typed instance dispatch and equality that actually works and theorems and native bytecode and yeah.


Apartment Hunting - Day 1

For those of you just tuning in, this is part of my ongoing journey of migrating to SF to work for Twitter. The lessons contained herein are a direct result of my coming to the area pretty much blind and paying the various prices for doing so.

So I visited eight apartments/complexes today. Pretty tired although better informed at the end of it, but emotionally exhausted enough by the process to be ranting about it. Prices may vary because I'm going by memory, pretty tired, horrified and generally uninterested in many of these places after today.

  1. Fox Plaza. Older building. Mixed use. Very tired. Rent controlled. New landlord for the residential portion in the last 4 years. An effort has been made to put new paint on and renovate, but it wasn't a really quality job. Bad vibes. $2750 for an unrenovated studio across the street from the office.

  2. NEMA. Newer building. Feels like and is priced like a resort. Justifies its rent on amenities (gym, pool, social shit, rooftop terraces, trainers and staff). Nothing under $3100, and that doesn't really buy you much in terms of the living space itself. Largely access to the amenities it feels like. For an apartment you'd want in the building it'll run $3400.

  3. SOMA residences. Newer building. Not much for staff or amenities. Lovely inner terrace on the 2nd floor which most of the apartments open in onto. The place I saw was 2nd of 3 floors, so only window was out onto the terrace and it came off as dark. Baldly perfumed, but spacious enough. Couple blocks from the office. $2750 or thereabouts.

  4. The Civic. Nothing under $3400. Not even high service like NEMA, just nicer rooms with more space.

  5. Argenta. l m a o. No studios in the whole building, $3800 and up. Not more space then the civ, lower service than NEMA. Just no.

  6. Olume. Same parent company as Argenta, but studios and "more reasonably" priced (start at $3300 (on promo! $3225!)). Lovely building. Really cute brunette sales lady. Low service but awesome rooftop space and much more attractive rooms than NEMA for the price.

  7. Fell & Gough. Weird little studio facing an inner courtyard in a 1912 building. Great old wood floors that creak with every step, but in an ​awful​ location IMO. Kitchen was microscopic/narrow, bathroom didn't look like it'd ever scrub clean from simple age. Maybe for $2300 or $2200, for the asked $2400 no.

  8. Fell & Webster. On the same busy street (Fell) as 7, but actually faces the street. The main living area has noticeable street noise that'll never go down because the street is always high traffic. The bedroom is partitioned from the living area and doesn't really hear the street, but shares walls with 1) outside walkway, 2) next apartment's bedroom 3) neighbor below's bedroom. Kitchen isn't super spiffy. Not that I'll use it a ton but still, at the same price point Fox's was nicer. For $2675 not so much. For $2.6k or $2.5k maybe.

NEMA or Olume represents a commitment of 48% of my income >.> no. Today's lesson is that I'm pretty clearly gonna have to craigslist this one out and find a studio in order to get a lease in my $2.7k and under goal budget.

So. Yeah. Lots of running about. Stressful because today was a negative lesson in that Fox which I thought was a safe choice really isn't somewhere I want to be and everything else like/near it is too damn expensive.

The good news is that the gays I'm staying with got me connected with a local Realtor who does rentals and validated that my goal budget & areas were reasonable. So if I don't find something by Friday I've got an appointment with him. At the low low price of 80% of one month's rent if I sign on something he shows and $500 if I don't sign on anything.

End Of Rant

Deprecation warnings in Jaunt

This week I'm working on getting Jaunt's 1.9.0 release nearer the door, on which note I'm happy to properly demo one the features of this first release: deprecation warnings.

A bare REPL will do for these demos.

$ cd $(mktemp -d)
$ wget https://clojars.org/repo/org/jaunt-lang/jaunt/1.9.0-RC4/jaunt-1.9.0-RC4.jar
$ java -jar jaunt-1.9-RC4.jar

Jaunt #2 (not my best ticket ever in terms of hygiene) was the first work I did on what became Jaunt and was really pretty critical for getting my teeth into the project and validating that there were tractable incremental improvements to be made over Clojure.

This change introduced four new switches to the Jaunt compiler, the most interesting of which is :warn-on-deprecated which enables or disables compiler warnings in support the use of ^:deprecated metadata on functions and namespaces. :warn-on-deprecated is on (true) by default, although it can be disabled globally with the JVM system property clojure.compiler.warn-on-deprecated=false, or by set!ing clojure.core/*compiler-options* say to have (set! *compiler-options* (dissoc *compiler-options* :warn-on-deprecated)) in a namespace where you want these checks to be disabled.

The semantics of deprecation are simple:

  • A namespace is deprecated if and only if it has the ^:deprecated metadata.
  • A definition (Var) is deprecated if either it occurs within a ^:deprecated namespace, or is itself marked ^:deprecated.

Vars and namespaces may become deprecated in any SemVer minor version or major version, but may only legally be deleted only on a major version.

Deprecation warnings are only emitted when a deprecated namespace or definition is accessed from a context which is not deprecated. Deprecated contexts are namespaces which are deprecated, and the bodies of deprecated definitions.

So for instance, in the following code:

(defn ^:deprecated bad-idea [x]
  (println "Oh noez!")
  (throw (new Exception "Bad code is bad")))
;; => #'user/bad-idea

(defn ^:deprecated also-bad-idea [x]
  (bad-idea x))
;; => #'user/also-bad-idea

(defn almost-better-idea [x]
  (bad-idea x))
;; Warning: using deprecated var: #'user/bad-idea ...
;; => #'user/also-bad-idea

The definition of bad-idea itself is innocuous. Simply evaluating deprecated definitions is fine. Warning that also-bad-idea uses bad-idea would be superfluous, because that fact is innocuous so long as neither is used. Suppressing warnings in this case allows Jaunt to load arbitrarily much deprecated code without drowning a user with false positive warnings.

Users should only have to care about deprecation at the edge between current code and deprecated code. Thus compiling almost-better-idea will emit a warning that it makes use of bad-idea, since almost-better-idea is not itself deprecated. Likewise at the repl invoking either deprecated function would also generate a warning.

As to namespaces, similar principles apply. Consider the two namespaces and trace:

(ns ^:deprecated com.proj.code)
;; => nil

(def some-const 3)
;; => #'com.proj.code/some-const

(ns com.proj.new-code
  (:require [com.proj.code :as old :refer [some-const]]))
;; Warning: aliasing deprecated ns: com.proj.code  ...
;; Warning: referring deprecated var: #'com.proj.code/some-const ...
;; => nil

(def g 4)
;; => #'com.proj.new-code/g

(def h (+ old/some-const g))
;; Warning: using deprecated var: #'com.proj.code/some-const ...
;; => #'com.proj.new-code/h

As the whole com.proj.code namespace is deprecated, merely referencing it and requiring that it be loaded is cause for a warning. Should that namespace ever be removed, the require would break regardless of whether anything from the required namespace is used or not.

If symbols which are deprecated are referred into a namespace which is not deprecated, each one of those will generate a warning as well for the same reason. However Jaunt tries to be smart about this, suppressing such warnings if a (require ' [... :refer :all]) or a (use ...) directive has been issued which constitutes an indefinite referral.

The symbol com.proj.code/some-const, is deprecated by dint of being defined in a deprecated namespace. Consequently a warning is emitted when it is used outside of a deprecated context, here in the definition of com.proj.new-code/h.

One last demo, disabling warnings!

(ns com.proj.no-warnings)
;; => nil

(set! *compiler-options* 
  (dissoc *compiler-options* :warn-on-deprecated))
;; elided...

(def ^:deprecated a 3)
;; => #'com.proj.no-warnings/a

(def b (+ a 4))
;; => #'com.proj.no-warnings/b

There's more than just this coming down the line in Jaunt, so if you're interested check out the 1.9 CHANGELOG, watch the repo or stay tuned here for more demos. Issues, ideas and especially pull requests are welcome :D