Thoughts on Gratipay

Those of you who, like me, don't read all the news may be vaguely aware that Gittip has rebranded to Gratipay of late, and then shut down for some time. As a former gittip customer/user this was visible to me only in that the weekly stream of tips disappeared.

It turns out that Gratipay shut down for two reasons. First off, their parent payment processor closed their doors (1). In trying to find another payment processor, the team ended up talking to legal council and discovering that they were operating what I understand to be an unlicensed wire transfer service (2). This forced the team to rethink their business model so as to, well, be legal. See (3).

The new direction espoused by the Gratipay team (4) is in short to try and support not individuals working on open source (or other) projects but rather to support cultures around projects themselves by allowing individuals to donate to a "project" and participants in that project to assign themselves "takes" from the revenues of that project (5).

The idea seems to be that this will in aggregate allow individuals to draw support from the projects they voluntarily contribute to, and to support the groups working on other tools for some definition of that group decided upon by the Gratipay "project" owner.

While I'm not sure if this is a generally good idea or not, I can say that as an individual it is definitely a sub-optimal approach to achieving any sort of funding for a project. From the last three years I've been engaged in open source volunteer work in and around the Clojure community my experience is that the overwhelming majority of projects fall under one of three categories: single volunteer maintainer, paid maintainer team with other responsibilities, and abandonware. The exception to this rule is the CIDER project (6) which has several very active volunteer contributors presided over by Bozhidar.

For single maintainer projects, to me the concept of "open work" as promoted by the Gratipay team doesn't make a whole lot of sense. Consider Grimoire. Sure I have an issue tracker and I'll gladly consider and merge changes, but at what point is someone said to have "joined" the project rather than having simply "contributed"? Were someone to write and subsequently contribute documentation to Grimoire, they would be at best a contributor not an owner.

Even in the case of maintained projects there are other (7) criticisms of this "open work" model which may be leveled although my own thoughts on these particular critiques are still unclear.

I note with some amusement that Justin Smith, arguably one of two other major "contributors" to Grimoire funds me elsewhere.

Multiple maintainer projects are in my experience typically drivers and other tools which were/are developed for corporate use and are maintained in an open source manner for whatever reason. As these projects are already payrolled by other ventures there is no clear reason why they should participate in an "open work" model when the majority of ongoing work is expected to be done by employees.

Finally maintainer-less projects to which no changes are being accepted clearly are not "open work" since nobody can do work on them without forking and creating a new project.

So in short I personally conclude that "open work" is a misfit for every common state of maintainership I've encountered. In either of the cases where maintainers are active, the bountysource model clearly makes more sense because users or maintainers can price out the piece work they wish done and there is no conception of a "contributor" to haggle over, nor issues with "contributor's" relative assessments of their contributions' worth to settle. The market of people willing to place bounties serves that purpose automatically with some efficiency.

As most projects are single maintainer, it seems to me that the most effective way to sponsor such work is either via a source bounty above or via direct ongoing payments or tips. You know, the model that Gratipay used to have.

All this said, I am seeking to have a Gratipay project (8) opened until such time as those who are currently attempting to fund me can be reached and their donations rerouted to my patreon account (9) which seems to serve my purpose better although payments are at monthly rather than weekly intervals.


AIM-514, saved from madness by a Knight

So last time I just brain dumped a bunch of notes on the subject of the AI Memo 514. This time I'll attempt to offer an analysis of the paper, and compare it to the Knight thesis.

The critical thing to understand about AIM-514 and the thing to which the authors hint repeatedly in the paper although it took me several reads to catch on is that by no means is the chip presented intended for actual use. As noted in the last few pages (Starting with the history section, Pg. 26), Sussman and Steele confess quite openly that the architecture of this chip was entirely an exercise in "because we can". The transformation of a recursive Lisp interpreter into a non recursive state machine presented in the introduction was already well understood, this project was just an exercise of putting it on a chip. In fact at the bottom of Pg. 28, the reader is warned

The processor described here is not to be confused with the "LISP Machine" designed and built at MIT by Greenblatt and Knight ... [which] is organized as a very general-purpose microprogrammed processor of the traditional kind. ... The processor described here is instead intended as an illustration of the abstracted essence of a single technique with as little additional context or irrelevant detail as possible.

Thanks to the "storage manager" unit (Pg. 18) the AIM-514 design is indeed elegant. The state machine which Steele and Sussman present only strictly needs cons, car, cdr and some form of setq. As the entire state of the machine is coded in terms of a series of linked lists representing bindings, the evaluation stack and the program itself. This creates a uniformity of access which allows the machine to be reduced to operating on an abstraction of a list which (Pg. 13) need never be specified because the processor itself isn't responsible for the structure of vector memory. Instead as noted last time, the memory manager seems to act instead as a small state machine (Pg. 19) which is itself a microcoded processor implementing the abstract memory operations in terms of what appears to be a finite (two element) stack with signaling mechanisms for blocking the "processor" until logical memory operations have completed (Pg. 21, "FREEZE" state).

While elegant this machine makes no sense, nor was it intended to make any. Every state transition of this machine is bound by one or more memory access operations. Most of the state transitions involve three memory reads or writes in order to manipulate the bindings and the clink list. In the worst case of the binding lookup operation however arbitrarily many reads could be required in order to complete a single logical instruction. Furthermore this machine requires another whole (much more traditional) processor to implement the memory operations which were reduced out of the "kernel" interpreter.

A far more reasonable approach to machine design, and the one pursued by Knight et all, is to have but one fairly traditional processor which consumes a more regular instruction stream and maintains far less state and if an interpreter is truly needed provide a software implementation thereof rather than attempting to rely entirely upon interpretation and tolerate the performance penalties thereof.

Next time will be more on the Knight machine, and probably an argument against microcode.


AIM-514, the beginning

In which I take a deliberate swan dive off the deep end.

Returning readers may recall that I am no fan of LispMs. However this semester at UT Austin I'm taking a research reading class on computer architecture. The focus of the class and hopefully of next semester's project is that of paralleism in computer architectures. As my professor has a sense of humor, his first assignment for me was to implement exactly one of the linked list interpreting machines which I previously mocked.

So here we go.

The paper in question is AI Memo 514, Steele and Sussman '79. This post is just an analysis of the paper which I wrote to force myself to understand Steele's memory controller. This post is not a presentation of or introduction to either Steele or my designs. Rather it is a bunch of notes on both which I needed to get out of my head and figured I would publish. Future posts will present and talk about the resulting FPGA implementation.

Memory model

This machine uses a 16 bit address space and 24 bit words. Pointer words are pairs [tag:8, value:16] where tag is one of the types or opcodes, and value is a memory address. Other word interpretations also exist but are special cases.

That structures tend to be small, randomly accessed and short lived is a long established fact in the study of garbage collection and object lifetimes. A naive memory model where literal lists are used to represent all storage while possible sacrifices a lot of space storing link pointers, and a machine using this model sacrifices much time traversing link pointers. Were I to implemnent a strictly linked list based system, a 48 bit word fully encoding a pair (car, cdr) would be more natural. However as hinted above and as studied in the various cdr coding papers (AIM-578 ~ Steele '80 & citations) frequently the structure we desire is not truly a linked list although we may encode it as one, rather it is a buffer or an n-tuple which may have some very compact & efficient vector representation given known size.

This suggests that buffer allocation while not logically part of the memory model must be supported, and leads to the conclusion of a 24 bit word size.


Steel observes on Pg. 13 that a constant requirement of the naive list interpreters presented earlier is to test the head of the list for equality with respect to the set of symbols which the interpreter knows how to handle. It would be far more efficient Steele observes to simply encode in the type field of the pointer to the list what the op to be performed on that list is. This encoding saves at least one memory access per expression evaluated and is a subtlety worth noting.

Note: I shall abuse the → notation here. → b denotes a pointer to whatever b is. From Clojure, ^foo is understood to denote that the following expression has metadata foo. ^{:tag int} x for instance would be a legitimate expression, the shorthand for which is simply ^int x since the common case of metadata is tags. So ^cons → (e₁ e₂) would be understood to mean a pointer to the list (e₁ e₂) having the tag symbolically meaning cons.

Steel proposes for instance that if and cons may be encoded with such tags, showing the following example on Pg. 14:

(if a
  '(x y)
  (if c
    (cons e 69)))

The encoding of which is given to be as follows:

^if → (^var → (0, 3)
       ^list → (^symbol → x
                ^list → (^symbol → y
                         ^symbol → nil))
       ^if → (^var → (1, 2)
              ^list → (^symbol → d)
              ^comb → (^var → (0, 1)
                       ^more → (^number → 69
                                ^cons → nil))))

One such optimization noted on Pg. 13 is that while naively local variables could be implemented by maintaining and walking an association list as Scheme is lexically scoped we can instead statically write a pair [n, m], where n is the number of binding stack frames (0 or greater) which we must traverse up to find the scope of the binding in question and m is the binding number at that level. By way of example, consider the following:

(let ((x 1)
      (y 2))
  (let ((z 3))
    (+ x ; relatively local (1, 0)
       y ;       ...        (1, 1)
       z ;       ...        (0, 0)

Steele makes use of several types, but does not discuss any of them at length here. Number, Symbol, List, Variable and Procedure all occur as tags for values. Furthermore the ops If, Bind, Local, Funcall, Cons, Car, Cdr, Atom, Combination and More-args appear. The implication is that there are several more such types which are not mentioned, however I have yet to discover them.

Memory controller

On Pg. 18, Steele discusses a multiple cycle memory controller (seemingly a simple stack machine) which operates on pointers and has the operations push, pop, drop, setq, cons, car and cdr. Push is a control state on which the value of the inbound bus is pushed onto the memory controller's stack. Pop writes the top of the stack out over the result bus. Drop discards the top of the stack. Setq writes the top of the stack to the 2nd of the stack, consuming both. Cons consumes two elements from the stack and pushes one (a new cell) back. Car and cdr both consume the top of the stack, pushing a replacement result value.

Steele's design does not feature a garbage collector due to chip area restrictions and by way of appology to the effect that it could be done cites much work on the subject of garbage collection. Due to having a large (8k word) memory in my prototype and wanting to get something working I will also for the time being ignore garbage collection.

One could implement a mark and sweep collector by changing the pointer format from having 8 bits of tag to having seven bits of tag and one of gc flag. A single bit is kept to count the number of garbage collection passes which have occured. When allocating resources, the tag bit is set to the value of this bit. When marking garbage, the complement of this bit is used to denote that space is reached and that value is recursively written to all reachable objects starting from the register set. Collection and compaction of freed space into larger regions then occur as one would expect and finally the counter bit is flipped. This gives the invariant that all storage is tagged as unused at the start of each collection pass and saves the need to traverse all of storage to reset collection flags.


Strictly Tagged Clojure

This summer I've been an intern at Factual, and this is an experience report from the semiannual internal hackathon where Alan 'amalloy' Malloy and I experimented with using Alexander Yakushev's Skummet fork of Clojure to emit lean(er) bytecode.

Some motivation

One of Clojure's primary use cases is as a more palatable tool with which to interact with the rich Java ecosystem and existing Java libraries. Because of its facilities for such inter-operation, Clojure is even sometimes used to write performance sensitive code which would otherwise be written in Java. However there are limitations to the success with which this may be done.

While JVM bytecode is statically typed, Clojure is an aggressively dynamically checked language which makes pervasive use of the Object type to delay typechecking. To this end, Clojure will use JVM Object reflection to resolve instance fields and methods when performing interoperation. While correct for unknown types, because reflective access is slow compared to direct access for known types it has long been possible to write type hints which advise Clojure about the runtime JVM type of a value and enable Clojure to use direct access and direct method invocation rather than reflective access.

However these hints are not types in the sense of a static type being a contract on the domain of values, they are merely hints for reflection elimination and place no contract on the domain of a hinted value.

This hinting behavior for reflection elimination comes at the cost of emitting checkcast instructions. As the JVM is statically typed, one cannot simply swear that a value is of a type, a checking cast must be used. Clojure, when emitting non-reflective method calls and field accesses, does not statically know (and makes no attempt to prove) that the value or expression in play is in fact of the type which you may have tagged it. All local variables and function parameters which are not JVM primitives are stored as Objects and so must be checkcasted to the desired type on every use.

So are we stuck trading slow reflective access for checkcast instructions (which are faster to be sure but cause method bloat when doing lots of interop on previously checked values)?

Of course not! While Clojure does not currently have support for real contractual types when using tags, we can sure add such behavior!

Now. Before you burn me at the stake for being a heretic, clearly since Clojure does not currently have strict local types, we can't just make tags strict. TEMJVM actually makes that mistake, and as a result cannot compile clojure/core because among others, clojure.core/ns-publics makes use of a type hint which while safe for nonstrict type tags is not correct in the context of strict tags. This has to be an additive, opt-in change.

So, what Alan and I did was create new special fn metadata flag ^:strict. If a fn body being compiled has the ^:strict tag, then and only then are are type tags treated as a contract rather than being advisory. This is a strictly additive change because stock Clojure will ignore the metadata and emit less efficient but still correct code.

So as an example, let's consider the following fn:

1 (defn sum ^long [^Iterable xs]
2   (let [iter (.iterator xs)]
3     (loop [tot (long 0)]
4       (if (.hasNext iter)
5         (recur (+ tot (long (.next iter))))
6         tot))))

Eliding a bunch of implementation details for brevity, this fn compiles on stock Clojure 1.7 to the following JVM bytecodes:

public final long invokePrim(java.lang.Object xs);
   0  aload_1 [xs]
   1  aconst_null
   2  astore_1 [xs]
   3  checkcast java.lang.Iterable [47]
   6  invokeinterface java.lang.Iterable.iterator() : java.util.Iterator [51] [nargs: 1]
  11  astore_2 [iter]
  12  lconst_0
  13  nop
  14  lstore_3 [tot]
  15  aload_2 [iter]
  16  checkcast java.util.Iterator [53]
  19  invokeinterface java.util.Iterator.hasNext() : boolean [57] [nargs: 1]
  24  ifeq 51
  27  lload_3 [tot]
  28  aload_2 [iter]
  29  checkcast java.util.Iterator [53]
  32  invokeinterface : java.lang.Object [61] [nargs: 1]
  37  invokestatic clojure.lang.RT.longCast(java.lang.Object) : long [64]
  40  invokestatic clojure.lang.Numbers.add(long, long) : long [70]
  43  lstore_3 [tot]
  44  goto 15
  47  goto 52
  50  pop
  51  lload_3
  52  lreturn
    Local variable table:
      [pc: 15, pc: 52] local: tot index: 3 type: long
      [pc: 12, pc: 52] local: iter index: 2 type: java.lang.Object
      [pc: 0, pc: 52] local: this index: 0 type: java.lang.Object
      [pc: 0, pc: 52] local: xs index: 1 type: java.lang.Object

// Method descriptor #77 (Ljava/lang/Object;)Ljava/lang/Object;
// Stack: 5, Locals: 2
public java.lang.Object invoke(java.lang.Object arg0);
   0  aload_0 [this]
   1  aload_1 [arg0]
   2  invokeinterface clojure.lang.IFn$OL.invokePrim(java.lang.Object) : long [79] [nargs: 2]
   7  new java.lang.Long [30]
  10  dup_x2
  11  dup_x2
  12  pop
  13  invokespecial java.lang.Long(long) [82]
  16  areturn

So here we have two methods. The first one, the invokePrim, takes an Object and returns a primitive long since we long hinted our function. The invoke method is a wrapper around the invokePrim method which provides for "boxing" (wrapping in an object) the primitive result of calling invokePrim. This allows our fn to be used by code which wants and can use a long, and code which doesn't know/care and just wants an Object back like a normal fn would return.

So lets dig into the invokePrim method.

  1. Load xs off the arguments stack. It's just typed Object because that's the parameter type.
  2. Load the constant nil.
  3. Store the nil to the local named xs, thus clearing it. Note that in the locals table, xs has the type Object. This means that when we get xs from the local, we have to checkcast it again because we've lost type information by storing and loading it.
  4. checkcast the xs we loaded to Iterable since we don't really know what it is.
  5. invokeinterface of the .iterator method to get an Iterator from our now guaranteed Iterable.
  6. Store our Iterable into the iter local (discarding type information as above)
  7. Load the constant 0 from the class constant pool.
  8. Store the tot local (primitive typed)
  9. Load our iter
  10. checkcast because in storing it we forgot that it's an Iterator
  11. invokeinterface to see if there are elements left, producing a primitive boolean
  12. Branch on the boolean going to 21 (in this list) if false
  13. Load tot from the local
  14. Load iter from the local
  15. checkcast that iter is still Iterator
  16. invokeinterface to get the next value from the iterator, producing an Object
  17. invokestatic to call the static clojure.lang.RT method for converting Objects to primitive longs.
  18. invokestatic to add the two primitive longs on the stack
  19. Store the new value of tot
  20. Loop back to 10.
  21. Clear the stack
  22. Load tot
  23. return

So with the exception of the first checkcast to make sure that the Object we got as an argument that should be Iterable is in fact an instance of Iterable, the checkcasts after load are all provably uncalled for. The static types of these values is known because their Java signatures are known, and the only reason that we have to emit all these checks is that the Compiler throws that information away by storing these values in untyped (Object) locals.

The Hack

Every Expr in clojure.lang.Compiler already knows (or can state) its type either as tagged or inferred, and whether it has such a tag. However, these stated Java classes are lies! A function invocation (IFn.invoke call site) is statically typed to return Object (unless it's a primitive call site but we know that as well) no matter what the tag on the IFn being invoked may say. For example clojure.core/str is tagged ^String and does indeed return a String, however after invoking the appropriate IFn the JVM doesn't know that there's a String on the stack because the IFn interface discards that type information. It just knows it has an Object. The fix is that we add a Expr.needsCast method and implement it for every instance of Expr in So now when in strict mode, we know that unless Expr.needsCast returns true, the value on the stack after Expr.emit absolutely is of type Expr.getJavaClass. Otherwise we cannot avoid the checkcast.

We also have to change the behavior of locals so that we can emit locals with types other than Object. By typing locals we preserve their type information as tagged or inferred across loads and stores. This allows the Expr representing a local use to report that it only needs a cast when the usage of the local doesn't have the same tag as the type of the binding and we cannot statically show no cast is required.

With these changes, our modified can indeed produce and use strictly typed locals. So lets add our annotation...

1 (defn ^:strict sum ^long [^Iterable xs]
2   (let [iter (.iterator xs)]
3     (loop [tot (long 0)]
4       (if (.hasNext iter)
5         (recur (+ tot (long (.next iter))))
6         tot))))

And generate bytecode on our modified version of Skummet 1.7-RC1-r4 (again abbreviated).

public final long invokePrim(java.lang.Object);
     0: aload_1
     1: aconst_null
     2: astore_1
     3: checkcast     #30                 // class java/lang/Iterable
     6: invokeinterface #34,  1           // InterfaceMethod java/lang/Iterable.iterator:()Ljava/util/Iterator;
    11: astore_2
    12: lconst_0
    13: nop
    14: lstore_3
    15: aload_2
    16: invokeinterface #40,  1           // InterfaceMethod java/util/Iterator.hasNext:()Z
    21: ifeq          43
    24: lload_3
    25: aload_2
    26: invokeinterface #44,  1           // InterfaceMethod java/util/;
    31: invokestatic  #49                 // Method clojure/lang/RT.longCast:(Ljava/lang/Object;)J
    34: ladd
    35: lstore_3
    36: goto          15
    39: goto          44
    42: pop
    43: lload_3
    44: lreturn
    Start  Length  Slot  Name   Signature
       15      29     3   tot   J
       12      32     2  iter   Ljava/util/Iterator;
        0      44     0  this   Ljava/lang/Object;
        0      44     1    xs   Ljava/lang/Object;

public java.lang.Object invoke(java.lang.Object);
     0: aload_0
     1: aload_1
     2: invokeinterface #59,  2           // InterfaceMethod clojure/lang/IFn$OL.invokePrim:(Ljava/lang/Object;)J
     7: new           #13                 // class java/lang/Long
    10: dup_x2
    11: dup_x2
    12: pop
    13: invokespecial #62                 // Method java/lang/Long."<init>":(J)V
    16: areturn

The win compared to the original bytecode should be obvious. Sure enough in the invokeStatic method we only emit the one checkcast we absolutely have to have because the xs argument could really be anything. The tot and iter locals are both statically typed, and so we can just load them and invoke the appropriate interfaces directly.

In some simple benchmarks, this optimization on this fn translates to a 5%-10% performance improvement which isn't too impressive. However other fns like clojure.core/str in our testing were able to get up to 20% performance improvements from strict locals.


This is the product of a two day hack. While Alan and I have been able to get it to work and emit working code, honestly this isn't something we're comfortable taking to production yet. Some clear wins such as being able to emit typed fn arguments by popping arguments, checking them and then putting them in typed local bindings for use and being able to take advantage of types on closed over locals remain on the table.

What Didn't (seem) To Work

While Alan debugged and picked off some types related wins we hadn't gotten yet I worked on adding more inlining behavior to clojure.core. Much of core, especially the lexically early fns are just thin wrappers around interop on clojure.lang.RT, which does have reasonable interface types on most of its methods.

The hope was that what with the typed locals work, preserving more type information across calls to the Clojure standard library and inlining the Clojure standard library where possible to interop calls with clearly inferable types we would be able to produce demonstrably faster code.

While in theory this should be at least break even and probably a win, we haven't managed to benchmark it well and show a clear win from the aggressive inlining work. In fact, the really interesting case of possibly aggressive inlining being an into which is able to use a typed, static Transient loop is impossible because into is implemented in terms of reduce, which takes the reducing fn as a value and then dispatches via clojure.lang.IReduce in order to get fast iteration over chunked seq. However we can't statically inline a call site through being taken as a value so that's the end of the line for that idea.

Inline Unrolling

We were however able to fully inline some interesting cases of clojure.core/assoc and clojure.core/conj. A common pattern in Clojure is to write a function f which has a zero arguments case returning a constant, a one argument case returning the argument unmodified and a two or more arguments case in which the operation f is reduced over the arguments provided. Rather than directly implement IFn, functions emitted by Clojure instead extend clojure.lang.AFn (source), which provides some out of the box support for functions of variable arity and function application.

Next Steps

These changes were motivated by internal performance requirements and will likely get polished until they are ready to be upstreamed back to Skummet. While we expect that the patch we have developed will never be included into Clojure as-is if only due to high impact, we hope to see this behavior or something like it enter the mainline in a future version of Clojure.

Edit 1: Skummet pull request submitted


Zach's Bookshelf

So ztellman posted some of his library and I was bored (and also more than somewhat curious what Zach reads) so I figured I'd make a list in the hope that at least I found some interesting reading material. It's not complete but it's pretty interesting as-is. I'll happily take changes if anyone else cares to read the original.

Top Shelf

  • The ocean at the end of the lane
  • Gravity's Rainbow
  • The Bug
  • The Cleanest Race
  • 99 novels
  • Everything and More
  • The Outsider
  • The Elements of Topographical Style
  • Mythologies
  • The Occult
  • The Hidden Dimension
  • The Sleepwalkers
  • Introduction to metaphysics
  • Notes on the synthesis of form
  • The Language of Instinct
  • Basic techniques of Go
  • Simulcra and Simulation
  • Godel, Escher, Bach
  • A Thousand Plateaus
  • Guattari
  • Warfighting

Second Shelf

  • Riddley Walker
  • Embassytown
  • Immortality
  • One hundred years of solitude
  • the general in the labyrinth
  • 2666
  • Veniss underground
  • The human comedy
  • budayeen nights
  • exile kiss
  • A fire in the sky
  • The Clockwork Orange
  • Cryptonomicon
  • Crystal Express
  • Count Zero
  • Snow Crash
  • Burning Chrome
  • The Diamond Age
  • Zero History
  • The Executioner's Song
  • Spook Country
  • Dirk Gently's Holistic Detective Agency
  • The Third Policeman
  • Death in Venice
  • The Book of Imaginary Beings
  • Sparrow
  • Illuminatus
  • At swim two birds
  • The Gray prince
  • The Demon Princess

Third Shelf

  • The Stories of Ray Bradbury
  • War against Cliche
  • The COmplete Stories of JG Ballard
  • Stories of Three Decades
  • Seven Science Fiction Novels of HG Wells
  • The Best of Gene Wolfe
  • The complete short stories of Ernest Hemingway
  • Italian Folktales
  • Case and the dreamer
  • The essential Pinter
  • Three Great Works

Bottom shelf

A literal shitload of Philip K Dick