The Future of the LispM

28 Nov 2014

This is also addressed to @lojikil towards whom I have threatened to write this post on multiple occasions. It is expected to piss off loper-os. Above all, it represents my opinions backed up only by a cursory research effort. You have been warned.

Background Of Historical LispMs

LispMs occupy a transitional period of computing industry history between time shared mainframes and the single user workstations that would become today’s desktop computers. Prior to John McCarthy’s invention of time sharing, all computers had been single program machines which multiple users shared via batch scheduling. The introduction of time sharing opened the way to exploring how to make computers useful interactively for many users sitting at terminals. Under bach processing, users would present fully formed programs written by hand or with little tool assistance to computer techs who would run the programs and return printed output.

The concept of time sharing and interactivity lead to the development of more dynamic programming environments including the Read Eval Print Loop. However at the same time, the falling costs of integrated circuits (and DoD funding for research into the design of such VLSI machines) began to put the first workstations, single user machines featuring large memories, within the financial reach of well funded AI laboratories which previously dependent on time shared mainframes.

In order to improve interactive performance, Richard Greenblatt and Thomas Knight developed the cons machine, the first of the MIT Lisp machines. Priced at ~$50,000 in 1970s dollars cons and the successor cadr machine were single user workstations built primarily it seems to address the restrictions on memory and memory performance and subsequent detrimental impact on general program performance faced on time shared systems due to the large memory footprint of Lisp programs and page thrashing. Hardware/microcode support was added for maintaining the relatively complex Lisp environment structures, and for many of the primitive lookup/load/store operations on cons cell structures that characterize a traditional Lisp implementation’s data access pattern. This combination of machine support and microcoding enabled Lisp “compilers” to be far simpler and offloaded the responsibility for maintaining complicated structures such as the environment from the compiler to the microcode system. Best of all as the microcode closely corresponded relatively directly to Lisp, on the rare occasions that truly writing microcode for say device drivers was called for doing so was easy because users could transparently invoke and define microcode from their native Lisp environment.

Seeing a business opportunity, in the early ’80s a number of MIT AI lab researchers departed and founded Symbolics Inc. and Lisp Machines Inc. for the purpose of building and selling Knight machine derived Lisp workstations. Ultimately however, Lisp machines did fall out of favor during the AI winter(s) and lost to what we now call the personal computer. I have seen the argument made that they were too big-ticket and missed the mass market as a result. Thus our story really begins.

Of Modern Hardware

In the years since the demise of the dedicated LispMs, Intel and the other major chip manufacturers have progressed from single cycle to pipelined, cache accelerated, superscalar and out of order machines. Knowledge of these designs is assumed. I suggest my earlier introduction series to these topics if you are not already familiar with them. At a high level however, each step of this progression trades off chip area, transistors (complexity) and power budget to wring more instruction level parallelism out of existing programs.

The Symbolics 3600 technical report from ‘83 claims a best case cycle time of 180ns (~5MHz) and a worst case of 250ns (4MHz). Arbitrary reads from main memory are priced at 600ns (3x the cost of a single instruction). This low (by modern standards) slowdown between the processor and memory meant that one could write programs that did lots and lots of random memory access and still have reasonable performance expectations. However, as a stack machine architecture, there is no opportunity for the instruction level parallelism exploited by modern architectures as the exact state of the stack which is depended on by every instruction changes with every instruction. This forces all program execution to wait during slow operations like main memory reads, division and floating point math due to the sequential data dependency on the stack. This is a fundamental architectural failure common to all stack machines and skirted by modern software stack machines like the JVM by compiling literal stack instructions to a register machine in order to recover instruction level parallelism.

Modern processors claim cycle times of 0.3̅ns for a 3GHz machine, often running multiple instructions per cycle. Unfortunately while processors core clocks have gotten faster since the days of the 3600, memories have not fundamentally improved. On modern hardware, 100ns for a random main memory read seems to be more or less normal. This represents a 27x speedup in processor speed mismatched with a 6x speedup in main memory latency for an effective 4.5x slowdown on memory reads. Now this is admittedly worst case memory latency. Thanks to the L1 and L2 cache layers, memory read times of 1ns to 3ns are not uncommon meaning that for cache efficient programs it is quite reasonable to expect less than a 2ns or 6 cycles for memory reads hitting in the L1 or L2 cache.

I mention this, because techniques like cdr coding serve to bring the average case of list access and construction down from their worst case behavior of generating massive heap fragmentation (and thus defeating caching and prefetching) towards the behavior of nicely caching sequential data layouts (classical arrays) typically via tree like structures or SRFI-101. While traditional linked list structures due to potential heap fragmentation provide potentially atrocious cache locality and this cripple the performance of modern cache accelerated machines, more modern datastructures can simultaneously provide the traditional cons list pattern while improving cache locality and thus performance characteristics. Guy Steele himself has even come out arguing this point.

Without literal implementation as a stack machine traversing linked lists and providing special hardware support for binding and soforth, the LispM instantly looses much of the vaunted simplicity which makes it attractive to program directly and becomes simply another register machine in the mold of modern ARM or Intel machines with strange long running microcoded instructions reminiscent more of the VAXen than of modern RISC inspired processors. In short due to the many performance limitations of linked lists as data structures we as an industry will never again build machines as the original LispMs were for the sole purpose of traversing and manipulating linked list data structures. Modern hardware simply offers better performance with more general applicability.

Of Modern Languages

We’ve also come a long way in language design and implementation. Compilers, once slow, have gotten faster and smarter. Virtual machines like the JVM, JavaScript and the CLR are becoming widely used deployment targets. The ML and Haskell families of languages have introduced us to concepts of real abstract types and abstract effects which can be used to build programs coupled only by the abstract properties of the data being consumed, generated and effects produced. Type inference is even making such fancy behavior manageable by mere mortals, while providing language implementations with more and more information with which to perform both program level optimization and micro-optimization not possible in traditional naive lisps.

Of LispMs Future

While we’ll never build a hardware LispM again, I suspect that we will see LispM like systems return one day in the not too distant future. Not as hardware, but as software or a virtual machine designed to run atop existing and entirely adequate modern hardware. Now in 10 years someone may make a commercial hardware LispM at which point I will be happy to eat my words, but as of right now I don’t see it happening.

The JVM and the .net CLR have proven to be amazing platforms. While their allocation requirements perhaps prohibit their use for implementing operating systems and driver code (not that people haven’t tried this sort of thing) they do offer excellent and standardized platforms. It is my personal belief that, by leveraging the flexibility that Lisp dialects have for both data driven DSLs and macro DSLs that it would be more or less trivial to implement an operating system using a DSL for generating platform specific assembly. As the “kernel” OS is implemented in the final “host” LispM language editing the kernel is possible albeit difficult due to the inherent difficulties of hot swapping operating systems.

Add a JIT (no this isn’t easy), preferably implemented in the same assembler DSL as the host operating system, and the stage is set for building a LispM environment as a virtual machine running atop modern hardware and using modern JIT to achieve reasonable performance characteristics. From this vantage the way is clear to implementing the rest of the operating system and user land in a fully fledged lisp with good reloading and introspection support atop this kernel of a JIT and enough of an OS to provide memory and process management.

This is, I think, an entirely reasonable and practical project for a smallish (~3 or fewer) team and a single target architecture (Intel). There was a project, Dream, an r4rs x86 assembler dsl in which an r4rs interpreter and operating system were implemented. It booted and worked more or less fine, and while it lacked polish I think it serves as an excellent proof of concept that such a self hosting Lisp (assembler os runtime) triple is viable. I don’t argue that such an implementation will be trivially portable to a wide variety of target hardware because as experience shows porting nominally portable Linux, FreeBSD or FreeRTOS is actually quite hard and porting a JIT can at best be as hard due to tight coupling with the specific behavior of the target machine but this is why we specify an abstract machine and then let implementations deal with the mess as the JVM does.

I also think that Lisp as a language family could stand to learn some new tricks if someone were to make the effort to build a full Lispy OS/VM thing.

Clojure is awesome. I’ve written a boatload of it. It’s an amazingly simple and elegant tool, and its persistent/immutable datastructures are absolutely worth stealing as is its concept of hygenic macros via symbol qualification and probably the concept of protocols/interfaces which Clojure kinda inherits from its primary host Java.

Racket is also awesome although I can’t claim to have written a bunch of it. Its approach to static compilation, strong support for documentation and examples via Scribble, its pattern matching and typechecking facilities are totally worth stealing.

Shen is interesting if only due to its built in Prolog engine enabling search for arbitrary proofs with regards to arbitrary program properties. While criticized by the Haskell community for disregarding completeness being able to write and search for proofs with regards to arbitrary properties of programs not just types is I think an amazingly powerful one albeit an active research area.

But is it a LispM?

There are some people who will rail against this vision of mine that the “OS” is as low as we need to try and push a language. To my mind, the advantages of pushing a language to the hardware level are scant enough as argued above that it simply does not justify the investment or the delay. Even the OS may be too low to push a language and that while the adventure of building an OS with a language to host the language ala Oberon because while integrating the language with the OS does offer maximal conceptual unity across the combined platform and provide a huge leap in ease of integrating pipelines of different applications it also forcibly discards programs not written in HOST_LANG for HOST_PLATFORM. This is a problem if only because the Unix model of computing as implemented on Linux with the help of the GNU user land is second only to the mach hiding inside of Mac OS X in terms of apparent developer adoption. Developers booting the next Dream for the first time won’t find familiar text editors or tools. It will be a brave new world.

Maybe that’s a good thing. Maybe we can escape the legacy of Unix with untyped byte streams and instead revive some of Plan 9 with a basis in lispy goodness and with more types everywhere. Maybe there’s even a place in this for algebraic effects. Maybe we can join urbit in looking towards a functional, fully netwoked future. Maybe. If only we can look past the obsolete machines and operating systems of yesteryear with which we seem content to circlejerk and move forwards with appropriate context and vision.

Or maybe we’ll just be stuck with Linux, Unix, OpenSSL and the rest of our somewhat suspect tower of legacy code out of sheer inertia and continue getting caught up in language of the month rages out of collective ADD.

 We have not wings, we cannot soar;
       But we have feet to scale and climb
 By slow degrees, by more and more,
       The cloudy summits of our time.

 The mighty pyramids of stone
       That wedge-like cleave the desert airs,
 When nearer seen, and better known,
       Are but gigantic flights of stairs.

 The distant mountains, that uprear
       Their solid bastions to the skies,
 Are crossed by pathways, that appear
       As we to higher levels rise.

 The heights by great men reached and kept
       Were not attained by sudden flight,
 But they, while their companions slept,
       Were toiling upward in the night.

~ Longfellow