06 Oct 2016
For the last couple of years I've been working with Clojure, a lisp which runs on top of the JVM.
My reservations with Clojure itself, and Clojure's maintainership are at this point fairly well established.
However I'd be lying if I said that after thinking long and hard about the way I want to develop software I've come up with anything incrementally achievable and better.
Clojure's syntax is convenient.
Its datastructures are clever.
Its immutable defaults are sane with respect to any other language.
Its integration with the JVM while fatal to its own semantics ensure unmatched leverage.
In short, I don't know if it's possible to do better atop the JVM.
But why use the JVM?
The JVM itself is a wonder of engineering, compiler technology and language research.
But the JVM standard library is deeply mutable and makes assumptions about the way that we can and should program that aren't true anymore.
While the JVM itself may be awesome, I'm just not convinced that the object/class library it comes with is something you'd actually want around as a language designer.
Especially as the designer of a functionally oriented language, or a language with a different typing/dispatch model than Java's.
The conclusion I personally came to was that, faced with what already exists on the JVM I couldn't muster the wherewithall to work in that brownfield.
I needed a greenfield to work with.
Somewhere I could explore, have fun exploring and make mistakes.
I honestly don't remember why I chose this name, but it's stuck in my head and adorns the git repo where all of this lives.
Dirt isn't something I'm gonna be releasing on github, although you can totally browse the code on git.arrdem.com.
Simply, dirt isnt's intended for anyone else to use or contribute to now or in the foreseeable future.
It's my experiment, and I think that my total control over the project is important to, well, finishing it sometime.
So what's the goal?
Versioning is something I think is really important both at the package and compilation unit level.
I previously wrote about artifact versioning, and experimented with versioned namespaces in my fork of Clojure.
Unfortunately, as a user experience versioned namespaces didn't pan out.
There were too many corner cases, and too many ways that partial recompilation could occur and generate disruptive, useless warnings.
So versioning is one major factor in Dirt's design.
Another is functional programming.
After working with the JVM, I'm pretty much convinced that design by inheritance is just flat out wrong.
While I was working with Dr Perry, he shared an awesome paper with me: Object-Oriented programs and Testing.
The point of the paper essentially is that testing inheritance structured programs adequately is really hard and everybody does it wrong.
Testing mutable programs is already hard enough, and single inheritance poses so many design difficulties that it just doesn't seem worthwhile.
In my time using Clojure, I found that I never actually needed inheritance.
Interfaces satisfied, and functions which operated against interfaces were better captured as functions in namespaces than as inherited static functions packaged away in some base class or helper package.
The failure mode of interfaces as presented by Java however is that they are closed.
Only the implementer of a type can make it participate in an interface.
This is far too restrictive.
Haskell style typeclasses get much closer to my idea of what interfaces should look like, in terms of being open to extension/implementation over other types and carrying contracts.
Which brings me to my final goal: contracts or at least annotations.
I like types.
I like static dispatch.
While I can work in dynlangs, I find that the instant my code stops being monomorphic at least with respect to some existential type constructed of interfaces/protocols I start tearing my hair out.
Types are great for dispatch, but there's lots and lots of other static metadata about programs which one could potentially do dataflow analysis with.
For all that I think Shen is batshit crazy and useless, the fact that it provides a user extensible type system which can express annotations is super interesting.
Basically I think that if programmers are given better-than-Clojure program metadata, and tools for interacting with program metadata that it would be possible to push back against @Jonathan_Blow's excellent observation that comments and documentation always go stale and become worthless.
So what's the architecture?
DirtVM so far is just a collection of garbage collectors and logging interfaces I've built.
All of the rest of this is hot air I hope to get to some day.
The fundamental data architecture is something like this:
Terms: (GPL1 | GPL2 | EPL | MIT | Other)
(bag of attributes)
Types, Ifaces, Fns
The essential idea is that because versioning is so hard, it's easier to fix the runtime to allow co-hosting of multiple versions of artifacts than to somehow try and solve the many many difficulties of software versioning and artifact development.
Java 9 modules look really really good and come close to being an appropriate solution, but the Java team have abandoned the idea of versioned modules.
In Dirt when code is compiled within a namespace it has access only to what has been explicitly imported by that namespace.
Imports are restricted to the contents of the module and the module's direct dependencies.
It is not possible for a namespace to import a transitively depended module.
This means that at all times a user is in direct control of what version of a function or a type they are interacting with.
There is no uncertainty.
This gets a little sticky for datastructures.
A depends directly on
B depends directly on
C, it's possible that
B will return into
A a data structure, function or closure which comes from the module
This turns out to work fine.
Within a single module, protocol/interface dispatch is done only against the implementation(s) visible in that module scope.
A has no knowledge at compile time of any such type from
C, it can't do anything with such a type except hand it back to
B which can use it.
Types and interfaces are very haskell style.
Mutability will be supported, but avoided in the standard library wherever possible.
Interfaces will be typeclass style pattern matching dispatch, not call target determined.
This makes real contracts like
Eq possible and extensible rather than being totally insane like non-commutative object equality.
Types are just records, and will be internally named and distinct by package, version namespace and name.
This makes it possible to have multiple consistent implementations of the same interface against versions of the same type co-hosted.
Much in the Haskell style, the hope is that for the most part interface dispatch can be statically resolved.
Why record types instead of a Lua or Python style dynamic object dispatch system?
Because after working with Clojure for a while now it's become clear to me that whatever advantages dynamic typing may offer in the small are entirely outweighed by static typing's advantages in the large, and that packaging functions into objects and out of namespaces buys you nothing.
While dynamic typing and instance dispatch can enable open dispatch they also defeat case based reasoning when reliability is required.
Frankly my most reliable Clojure code would have translated directly to Haskell or Ocaml.
Refactoring matters especially as projects grow.
Being able to induct someone else to your code matters.
Being able to produce meaningful errors someone understand and can trace to a root cause requires information equivalent to types.
Dynamic typing just obscures static contracts and enables violations to inadvertently occur, leaning on exhaustive test coverage.
Dynamic typing introduces concrete runtime costs, and slows down program evolution because building tools is simply harder.
Tools matter, so static typing ho.
In addition to interfaces/typeclasses, there are also fns (
fn in Clojure) which are statically typed, non-extensible, single arity procedures.
Despite impurity, the term function is used for these in keeping with industry convention.
The namespaces are very much Clojure style, because I've been really happy with the way that Clojure's namespaces work out for the most part and I want to support a language which isn't at least syntactically and in the namespace system that distant from Clojure.
Import renaming is awful, but qualified imports are fine hence why imports support aliases.
The ultimate goal of this project is to be able to present a virtual machine interface which is itself versioned.
Imagine if you could write software which used dependencies themselves targeting incompatible/old versions of the standard library!
That would solve the whole problem of language and library evolution being held to the lowest common denominator.
Dirt itself will be a garbage collected, mostly statically typed bytecode VM much like the JVM.
Probably gonna get a ssa/dataflow bytecode level representation rather than a stack machine structure.
But that level of detail I'll figure out when I get to it.
For now I'm having fun with writing C and garbage collectors.
The next step will probably be some pre-dirt language to help me generate C effectively.
Here's to log cabins and projects of passion!
02 Sep 2016
Ferd T-H was kind enough to perfectly voice one of my long standing frustrations with Clojure and it feels like many small programming communities and I couldn't resist sharing.
A community of people fine with inadequate tooling/docs/attitude/etc overlooks those who avoided it for that reason.
Then again, reframe the question as a trial by fire and dealing with less than ideal conditions becomes a badge of honor.
Which in turns possibly only breeds more tolerance for entrenched inadequacy.
You need fresh eyes to point out your slanted perspective.
My criticism of any "design for experts" philosophy may be inferred.
13 Jun 2016
As with some of my other posts, this was originally an email which turned into enough of a screed I
thought it would be of general interest and worth posting. Some op/ed material has been removed, but
the technical details are unaltered if not clarified.
So here's the rundown I've got so far on a lisp with immutable environments. I'm merely gonna sketch
at some stuff, because giving a full treatment would require dusting off and finishing a bunch of
stuff I came up with for Ox and put down again.
I also threw out the Haskell sketch I was dinking with because it
was just distracting from writing this :P
So lets start from the top.
Clojure is a form-at-a-time language which happens to support reading files of forms (or rather
Readers which produce textual representations of forms, where they come from is magical).
Binding is achieved through a global, thread shared, transactional, mutable mapping from
Namespaces, which are themselves transactional mappings from
Vars, which are
themselves a triple
(QualifiedSymbol, Metadata, Binding).
Vars happen to also support dynamic
binding (pushing and popping), but this is less used. From here on out I'll treat them simply as a
global mostly static binding mechanism, which is their primary use case anyway.
Control and local bindings are achieved entirely using lambda/fn forms compiled to JVM methods,
produced by the opaque compiler as opaque IFn/AFn objects. Top level forms are compiled and executed
for effect by being wrapped in an anonymous
(fn  <form>) which can be blindly invoked by the
compiler to realize whatever effects the form may have.
Circular dependencies are achieved via
Var indirection. A
Var is first created with no binding,
so that it can be referenced. With that environment binding, code which will depend on (call) the
Var's final bound value can be compiled since it just needs the target
Var to exist in order to
compile. The depended on
Var may then be redefined, having both itself and its dependent visible
in the compilation context and so the two
Var bindings can be mutually recursive.
While traditional and lispy, this approach has a number of problems which I'm sure I don't need to
reiterate here. The goals of an immutable namespace system then should be to
- Make Namespaces the compilation unit rather than Forms
- Make Namespaces reloadable (change imports/exports/aliases w/o system reboot)
- Enable Namespace compilation to fail sanely
- Enable refactoring/analysis
The sketch of this I've been playing with is as follows:
Lets assume an interpreter. If you can interpret you can compile as an implementation detail of eval
and interpretation is easier to implement initially. Assume a reader. So we read a form, then we
have to evaluate it. Evaluation is first macroexpansion, then sequential evaluation of each
resulting form in the environment. Evaluation at the top level is
eval Env Form -> Env (we discard
the result) which is fine. Because we aren't really doing compilation, only interpretation,
Clojure's tactic of having Vars and analyzing against Var bindings works 1:1. Create an unbound
symbol, analyze/bind against it, then add a binding later.
The (global) environment structure I had in mind is pretty simple: one unqualified symbol names the
current namespace (
*ns*), a mapping exists from unqualified symbols to Namespaces.
Namespaces are essentially compilation contexts, and really just serve to immutably store the
alias mapping, import mapping, the mapping of the
Vars in the namespace to bindings, and a list of
Vars/exports from the namespace.
Vars are a fully qualified name, metadata and a value. That's it.
Compilation occurs at the namespace level. All the forms in a namespace are sequentially read and
evaluated in interpretation mode. All macros expand normally and type hints are processed they just
don't do anything.
Once the whole namespace has been successfully loaded in interpretation mode, a second pass may be
made in which static linking is performed. Because everything in the namespace has been fully
realized already it's possible to statically link even mutually recursive calls within the same
namespace. Full type inference within the namespace is also possible here, etc.
Aside: Interestingly because it is single pass, the existing Clojure compiler doesn't (can't)
handle hinted mutually recursive function calls at all. It relies on Var metadata to store arglists
(and hints for primitive invocations), so in order to emit a hinted recursive call the forward
declaration of the var has to carry the same
^:arglists metadata (hints and all!) which will be
defn whenever it is finally defined.
Once a namespace has been fully loaded and compiled (including the second pass) the global
environment with the resulting namespace and var mappings is the result of whatever caused
loading. If an error occurs during compilation, we just abort and return the environment state
before anything else happened. I think there are tricks to be played with generated garbage class
names and classloader isolation here so that this works even during the 2nd static compile pass, but
it should be possible to fully clean up a failed compile.
So this works really great for module loading, and even for macros which expand into multiple def
forms. This model of
eval :: env -> form -> (env, result) starts to break down when you want to
talk about macros that do evaluation during code loading, and likewise the REPL. Basically your
interpreter winds up holding an additional reference,
*env* or something, which is intrinsically
mutable, but which references an immutable environment and holds shall we say the state continuation
of the entire REPL. Consider user code which actually calls
eval during execution. Whatever state
changes that evaluation creates need to be preserved in the "global" continuation.
In this model when you compile a form at the REPL, the runtime could obviously interpret (this may
be appropriate for macroexpansion) and can also compile. This is only tricky when the user
recompiles a symbol which was already bound/compiled. In this case, the runtime could either eagerly
recompile all the code which links against this function using the same interpret then statically
link tactic or could just invalidate any compiled bytecodes and lazily recompile later. The former
is probably more predictable.
Once a namespace has been reloaded/altered, all namespaces importing it must also be recompiled in
topsort order using the same tactics. That we already have per-symbol dependency information and
per-module dependency information helps with building refactoring tools which otherwise have to
derive all this themselves. Ideally top level expressions/definitions would also expose local
variable tables tables and local variable dataflow graphs so that analysis there is also possible.
Aside: It may be possible to allow cyclic dependencies between namespaces (see
submodules in racket for some study
on this problem). In the common case it may well be that macros are well behaved and that cyclic
dependencies between modules work out just fine. However because macros perform arbitrary
computation it's also pretty easy to cook up pathological cases where macro generated code never
reaches a fixed point in repeated interpretive analysis which can be statically compiled. For this
reason I'm inclined towards formalizing a
module special form and throwing out
import and friends as legal top level forms altogether. Namespaces should be declarative
as in Java/Haskell.
There's probably a way which I just haven't seen yet to treat updates to the "global" namespace
mapping and updates within a namespace the same way via the same mechanism since they're both
dependent on the same dependency tracking and recompile machinery.
Writing this has made me think about having an ImmutableClassLoader which is derived entirely from a
immutable environment and loads classes by lazily (statically) compiling forms.
That's kinda all I got. ztellman gets credit for the mutable
*env* in the repl bit which I spent
literally weeks trying to do without last year. Maybe something with monad transformers can get you
there but I couldn't figure it out. mikera has some sketches of all this with types in his
I've already prototyped an environment system that looks a lot like this in the Ox repo if you want
to go dig. Originally there was this
then I started dinking with a Java implementation of the environment structure
which also has a couple different stack based contextual binding types for the interpreter.
As written about in the Jaunt essays, I've kinda concluded that whatever this immutable ns language
winds up looking like is so divorced from what Clojure or at least the way that most people write
Clojure that you'll loose so much value in the changeover it won't pay off for a very long time. The
mount library is a perfect example of this in that it's deeply
imperative and takes advantage of code loading order to achieve start and stop. It's not the only
bit of code I've seen which does effects at load time either. Hence Jaunt which tries to be less
flawed around namespace reloading/refactoring without going whole hog on ns immutability.
The more I kick this stuff around the more I find that the whole endeavor verges on a Haskell clone
in sexprs and the more I decide I'd rather take the time to learn Haskell properly than mess with an
immutable lisp or Jaunt. Besides then you get nice stuff like typeclasses and fully typed instance
dispatch and equality that actually works and theorems and native bytecode and yeah.
31 May 2016
For those of you just tuning in, this is part of my ongoing journey of migrating to SF to work for
Twitter. The lessons contained herein are a direct result of my coming to the area pretty much
blind and paying the various prices for doing so.
So I visited eight apartments/complexes today. Pretty tired although better informed at the end of
it, but emotionally exhausted enough by the process to be ranting about it. Prices may vary because
I'm going by memory, pretty tired, horrified and generally uninterested in many of these places after
Older building. Mixed use. Very tired. Rent controlled. New landlord for the residential portion
in the last 4 years. An effort has been made to put new paint on and renovate, but it wasn't a
really quality job. Bad vibes. $2750 for an unrenovated studio across the street from the office.
NEMA. Newer building. Feels like and is priced like a
resort. Justifies its rent on amenities (gym, pool, social shit, rooftop terraces, trainers and
staff). Nothing under $3100, and that doesn't really buy you much in terms of the living space
itself. Largely access to the amenities it feels like. For an apartment you'd want in the
building it'll run $3400.
SOMA residences. Newer building. Not much for staff or
amenities. Lovely inner terrace on the 2nd floor which most of the apartments open in onto. The
place I saw was 2nd of 3 floors, so only window was out onto the terrace and it came off as
dark. Baldly perfumed, but spacious enough. Couple blocks from the office. $2750 or thereabouts.
The Civic. Nothing under $3400. Not even high service like NEMA, just
nicer rooms with more space.
Argenta. l m a o. No studios in the whole building, $3800 and
up. Not more space then the civ, lower service than NEMA. Just no.
Olume. Same parent company as Argenta, but studios and "more reasonably"
priced (start at $3300 (on promo! $3225!)). Lovely building. Really cute brunette sales lady. Low
service but awesome rooftop space and much more attractive rooms than NEMA for the price.
Fell & Gough. Weird little studio facing
an inner courtyard in a 1912 building. Great old wood floors that creak with every step, but in
an awful location IMO. Kitchen was microscopic/narrow, bathroom didn't look like it'd ever
scrub clean from simple age. Maybe for $2300 or $2200, for the asked $2400 no.
Fell & Webster. On the same busy street
(Fell) as 7, but actually faces the street. The main living area has noticeable street noise
that'll never go down because the street is always high traffic. The bedroom is partitioned from
the living area and doesn't really hear the street, but shares walls with 1) outside walkway, 2)
next apartment's bedroom 3) neighbor below's bedroom. Kitchen isn't super spiffy. Not that I'll
use it a ton but still, at the same price point Fox's was nicer. For $2675 not so much. For $2.6k
or $2.5k maybe.
NEMA or Olume represents a commitment of 48% of my income >.> no. Today's lesson is that I'm pretty
clearly gonna have to craigslist this one out and find a studio in order to get a lease in my $2.7k
and under goal budget.
So. Yeah. Lots of running about. Stressful because today was a negative lesson in that Fox which I
thought was a safe choice really isn't somewhere I want to be and everything else like/near it is
too damn expensive.
The good news is that the gays I'm staying with got me connected with a local Realtor who does
rentals and validated that my goal budget & areas were reasonable. So if I don't find something by
Friday I've got an appointment with him. At the low low price of 80% of one month's rent if I sign
on something he shows and $500 if I don't sign on anything.
End Of Rant
08 Mar 2016
This week I'm working on getting Jaunt's 1.9.0 release nearer
the door, on which note I'm happy to properly demo one the features of this first release:
A bare REPL will do for these demos.
$ cd $(mktemp -d)
$ wget https://clojars.org/repo/org/jaunt-lang/jaunt/1.9.0-RC4/jaunt-1.9.0-RC4.jar
$ java -jar jaunt-1.9-RC4.jar
Jaunt #2 (not my best ticket ever in terms of hygiene)
was the first work I did on what became Jaunt and was really pretty critical for getting my teeth
into the project and validating that there were tractable incremental improvements to be made over
This change introduced four new switches to the Jaunt compiler, the most interesting of which is
:warn-on-deprecated which enables or disables compiler warnings in support the use of
^:deprecated metadata on functions and namespaces.
:warn-on-deprecated is on (
default, although it can be disabled globally with the JVM system property
clojure.compiler.warn-on-deprecated=false, or by
(set! *compiler-options* (dissoc *compiler-options* :warn-on-deprecated)) in a namespace
where you want these checks to be disabled.
The semantics of deprecation are simple:
- A namespace is deprecated if and only if it has the
- A definition (Var) is deprecated if either it occurs within a
^:deprecated namespace, or is
Vars and namespaces may become deprecated in any SemVer minor version or major version, but may only
legally be deleted only on a major version.
Deprecation warnings are only emitted when a deprecated namespace or definition is accessed from a
context which is not deprecated. Deprecated contexts are namespaces which are deprecated, and the
bodies of deprecated definitions.
So for instance, in the following code:
(defn ^:deprecated bad-idea [x]
(println "Oh noez!")
(throw (new Exception "Bad code is bad")))
;; => #'user/bad-idea
(defn ^:deprecated also-bad-idea [x]
;; => #'user/also-bad-idea
(defn almost-better-idea [x]
;; Warning: using deprecated var: #'user/bad-idea ...
;; => #'user/also-bad-idea
The definition of
bad-idea itself is innocuous. Simply evaluating deprecated definitions is
fine. Warning that
bad-idea would be superfluous, because that fact is
innocuous so long as neither is used. Suppressing warnings in this case allows Jaunt to load
arbitrarily much deprecated code without drowning a user with false positive warnings.
Users should only have to care about deprecation at the edge between current code and deprecated
code. Thus compiling
almost-better-idea will emit a warning that it makes use of
almost-better-idea is not itself deprecated. Likewise at the repl invoking either deprecated
function would also generate a warning.
As to namespaces, similar principles apply. Consider the two namespaces and trace:
(ns ^:deprecated com.proj.code)
;; => nil
(def some-const 3)
;; => #'com.proj.code/some-const
(:require [com.proj.code :as old :refer [some-const]]))
;; Warning: aliasing deprecated ns: com.proj.code ...
;; Warning: referring deprecated var: #'com.proj.code/some-const ...
;; => nil
(def g 4)
;; => #'com.proj.new-code/g
(def h (+ old/some-const g))
;; Warning: using deprecated var: #'com.proj.code/some-const ...
;; => #'com.proj.new-code/h
As the whole
com.proj.code namespace is deprecated, merely referencing it and requiring that it be
loaded is cause for a warning. Should that namespace ever be removed, the require would break
regardless of whether anything from the required namespace is used or not.
If symbols which are deprecated are referred into a namespace which is not deprecated, each one of
those will generate a warning as well for the same reason. However Jaunt tries to be smart about
this, suppressing such warnings if a
(require ' [... :refer :all]) or a
(use ...) directive has
been issued which constitutes an indefinite referral.
com.proj.code/some-const, is deprecated by dint of being defined in a deprecated
namespace. Consequently a warning is emitted when it is used outside of a deprecated context, here
in the definition of
One last demo, disabling warnings!
;; => nil
(dissoc *compiler-options* :warn-on-deprecated))
(def ^:deprecated a 3)
;; => #'com.proj.no-warnings/a
(def b (+ a 4))
;; => #'com.proj.no-warnings/b
There's more than just this coming down the line in Jaunt, so if you're interested check out the
1.9 CHANGELOG, watch
the repo or stay tuned here for more demos. Issues, ideas and
especially pull requests are welcome