Of Hobbes and Reason

This was originally a term paper of mine, written in markdown, and appears here because reasons.


In his Leviathan, Hobbes presents a theory of ethics founded upon individual equality, egoism and rationality as the basis for a political theory. This theory of ethics is developed from a "state of nature" free of rulers, duties and the other hallmarks of civil society in which every man is at war with all others. From this state of nature, Hobbes claims to show that laws of nature and application reason will allow for the discovery of contracts, being exchanges of liberties and subjugate themselves in exchange for security. The most basic example of such a society is the contract not to murder other parties in the contract, to respect all contracts made with other parties to the contract and the duty of all parties to repay breach of this contract with murder. This being a minimal civil society without leader constructed upon the death penalty alone.

There are several premises to unpack in this argument. The first is the premise of egoism which will be taken undisputed as an axiom for while there may be flaw to be found here I would be writing a different paper should I challenge it.

The second premise is that contracts will be both invented and third that they will be honored. If they are not invented, then the contractual surrender of rights out of mutual fear upon which the above minimum civil society is founded cannot occur. If the are not honored, then the minimum civil society cannot endure because men will find it more in their self-interests to act against their fellows than to preserve the society thus returning its members to the state of war.

Hobbes offers a lengthy defense of contracts as a production of egoist self-defense and reason in chapter 14. This requires that Hobbes offer a definition and a defense of reason, of which I will say more momentarily.

The third premise, that contracts will be honored, is defended in chapter 15. Hobbes presents an individual, The Fool, who claims that it is not so much in his egoist interest to respect a contract as it is to breach it. The argument against The Fool is in short that given the premises

  1. No man can breach contract without being known to have done so
  2. No man will enter into a contract with a man known to have breached contract for his own gain.
  3. No breach of contract can so secure The Fool through wealth or strength as to place him above fear of war

Thus by application of reason Hobbes concludes that all men will discover that if they wish to avoid the state of fear and achieve peace, they must honor contracts. This is what he terms the third law of nature, and I will challenge it at length later.

We now have two premises which hinge on some workable definition of reason common to all men. If men fail to construct and appreciate this argument for a civil society by contracts themselves using only innate reason, then by the premise of egoism they must act as The Fool would ignoring contracts and serving their own interests. This would render the escape from the state of war into a civil society which Hobbes seeks to lay out impossible. So what of reason?

Hobbes defines reason in [Ch5, P1] as follows:

When a man reasoneth he does nothing else but conceive a sum total from addition of parcels or conceive a remainder from some subtraction of one sum from another. Which, (if done by words) is conceiving of the consequence of the names of all the parts to the name of the whole and one part to the name of the other part.

which definition is dependent on his earlier construction of naming as symbolic reference to ideas in [Ch4, P12] and consequent presentation of meaningful speech or language as productions of names in [Ch4, P22] contrasted with meaningless speech in [Ch4, P24] which terms have no absolute meaning and which cannot be relied upon for the construction of productions.

Prudence is defined in [Ch3, P7] as follows:

Some times a man desires to know the event of an action; and then he thinketh of some like action past, and the events thereof one after another, supposing that like events will follow like actions. Which kind of thoughts is called foresight and prudence and sometimes wisdom.

And in [Ch8, P11] Hobbes elaborates:

When the thoughts of a man that has a design in hand, running over a multitude of things, observes how they conduce to that design, or what design they may conduce unto; if his observations be such as are not easy, or usual, this wit of his is called prudence, and dependeth on much experience, and memory of the like things and their consequences heretofore. In which there is not so much difference of men as there is in their fancies and judgements; because the experience of men equal in age is not much unequal as to the quantity, but lies in different occasions, every one having his private designs. To govern well a family and a kingdom are not different degrees of prudence, but different sorts of business; no more than to draw a picture in little, or as great or greater than the life, are different degrees of art. A plain husbandman is more prudent in affairs of his own house than a Privy Counsellor in the affairs of another man.

That is, for Hobbes, reason as quoted above is any chain of inferences. Prudence is specifically those inferences based on knowledge of events past with a view to the future. There also seems to be some connotation that prudence is limited to one's own affairs (implicitly that one has learned ones own business and has some mastery of it). This is suggested in [Ch13, P2]:

For prudence is but experience, which equal time equally bestows on all men in those things they equally apply themselves unto.

Well and good, and consistent with the above comments on prudence.

However of reason, Hobbes says in [Ch5, P17]

By this it appears that reason is not, as sense and memory, born with us; nor gotten by experience only, as prudence is; but attained by industry: first in apt imposing of names; and secondly by getting a good and orderly method in proceeding from the elements, which are names, to assertions made by connexion of one of them to another; and so to syllogisms, which are the connexions of one assertion to another, till we come to a knowledge of all the consequences of names appertaining to the subject in hand; and that is it, men call science.

This is a strange concession to make on the subject of reason. Nor is this the only place Hobbes makes this connection between reason and education. In [Ch8, P13] Hobbes says:

As for acquired wit (I mean acquired by method and instruction), there is none but reason; which is grounded on the right use of speech, and produceth the sciences. But of reason and science, I have already spoken in the fifth and sixth chapters.

This seems problematic because it suggests a confusion in terms between the process of reasoning being following a chain of inferences and some form of reasoning from education in the form of acquired or produced inferences rather than learned inferences (being prudence). This suggests that far from being "natural" reason is in fact an acquired skill which men will not naturally get. If prudence is not true reason as Hobbes defines it, and prudence does not lead to the development of reason, then Hobbes must offer a defense of how men in the state of nature will acquire his form of reason in order to discover and consequently follow natural law. But first we need to explore natural law itself.

In his introduction of natural law, Hobbes says [Ch13, P14]

The passions that incline men to peace are: fear of death; desire of such things as are necessary to commodious living; and a hope by their industry to obtain them. And reason suggesteth convenient articles of peace upon which men may be drawn to agreement. These articles are they which otherwise are called the laws of nature ...

Hobbes goes on in chapter 14 to present what he holds to be a train of "right reason" producing his first and second laws of nature, then contracts. However his conception of natural law must be examined first before we accept the consequences and allow his philosophy. From [Ch14, P3],

A law of nature, lex naturalis, is a precept, or general rule, found out by reason, by which a man is forbidden to do that which is destructive of his life, or taketh away the means of preserving the same, and to omit that by which he thinketh it may be best preserved.

The first part of this definition, that natural law is "a precept or general rule found out by reason by which a man is forbidden to do that which is destructive to his life" is entirely uncontroversial and acceptable. That is simply the compounding of Hobbes' axiom of egoism (stated as the jus naturale in [Ch14, P1]) with either of reason and prudence. Reason, as defined by Hobbes being sequential production from learned principles, an egoist can be expected to do nothing which he may by reason expect to be destructive to his own interests. Likewise, an egoist by prudence as defined by Hobbes may be expected to do nothing which he may by prudence (reasoning over experience) expect to be destructive to his own interests. However the second part of the conception of a natural law is difficult, as Hobbes says "... and to omit that by which he thinketh it may be best preserved". This is not a statement of egoism, this is rather a statement of altruism as it contradicts the premise of egoism without explanation.

By sequential production, one could conclude that seeking what best preserves ones' interests is certain to incite conflict and thus strife, even violence with others as one has no more strength with which to claim or hold such goods than any other. Consequently, one could conclude that the rational and egoist thing to do is balance avoiding conflict with seeking what best serves ones' interests as the risk of conflict (and ones' death) outweighs the potential good of best preserving ones' interests in the presence of other vehicles whereby to preserve them. However Hobbes neglects this production of rational egoism, instead simply stating this principle of altruism as the basis for his lex naturale. The following quote

And consequently it is a precept, or general rule of reason: that every man ought to endeavor peace, as far as he has hope of obtaining it;

is then given Hobbes' construction of lex naturale circular, for the premise (definition of lex naturale) encodes the principle of altruism by which men are commanded (jus vs lex) to avoid conflict. This difficulty could perhaps be worked out by using a more modern game-theoretic construction of altruism based on relative goodness and risk assessment as I have sketched above, but Hobbes fails to do so leaving the above mentioned defense of natural law from reason wanting.

Throughout his discussion of his production of natural law in chapter 14, and the defense of the 3rd law of nature (respect contracts) in chapter 15, Hobbes seems to use the word reason in a different sense than he defined it above. As defined above, reason is sequential productions of consequences from learned truths. In the state of nature presented by Hobbes however, there is and can be no formal learning as we know it. Consequently Hobbes seems to fall into the fallacy of equivocation in that he intends a second meaning, some "common reason" requiring no education (for The Fool must be defeated with "common reason" alone) and distinct from prudence or indeed he intends prudence as he previously defined and fails to clarify.

If we allow that this is a confusion of terms, and that Hobbes does indeed intend to allow some form of "common reason", then we come to another difficulty. If the presented "common reason" is sufficient to discover the laws of nature and common to all men, then given the same set of facts all men can be expected to reach the same conclusion. It is exactly this expectation of men all arriving at a single correct inference which enables the expectation that men will devise and establish a contractually ordered society. However Hobbes does not seem to share this view required to support his own argument. Rather he seems to believe that reason can very much go wrong, and goes so far as to lay out what he terms "wrong reason" [Ch5, P3]:

And as in arithmetic, unpracticed men must, and professors themselves may often err and cast up false, so also in any subject of reasoning, the ablest, most attentive and most practiced men may deceive themselves and infer false conclusions, not but that reason itself is always right reason, as well as arithmetic is a certain and infallible art.

Rather than appealing to the wisdom of all men, instead Hobbes lays out an appeal to the wisdom of chosen men as he calls them arbitrators or judges. Again from [Ch5, P3]:

But no one man's reason nor the reason of any one number of men makes up the certainty, no more than any account is therefore well cast up, because a great many men have unanimously approved it. And therefore, as when there is controversy in an account, the parties must by their own accord set up for right reason the reason of some arbitrator or judge, to whose sentence they will both stand, or their controversy must either come to blows or be undecided, for want of a right reason constituted by Nature, so is it also in all debates of what kind soever.

In the context of Leviathan overall as a political work, this structure makes sense as it serves to establish a rationale for the imposition of judges and kings whose reason shall be taken as right reason and by which all men shall abide, but in order to establish this system it is self-defeating. Reason is not common to men by his own words, rather it is a product of education, distinct by study from prudence. No provision for the general acquisition of such reason is made, furthermore when men use reason it cannot be relied upon. Thus the entire project of producing contracts and the minimum civil society above is called into question as it is founded upon an appeal to a confused and self-defeating definition of reason which all men must have and yet Hobbes says men lack.

Of prudence, nothing more is said. Perhaps the project could be saved by appealing to general prudence of men rather than to their reason. In the state of nature, general conflict over the means to ones' preservation is presented to be inevitable. Thus such conflict being inevitable and all men being on conflict all men will experience (and some will survive) such conflict. Prudence then would lead men to remember such conflict when thinking of the means to future ends, and then the game theoretic conception of altruism above would by prudence follow and enable them to save themselves from destruction and war by inventing peace and contracts. But Hobbes does not make this argument, and so I shall put down my pen somewhat more disillusioned with Hobbes than when I picked it up.

^d

Toothpick: A theory of bytecode machines

In working on my Batbridge simulators and other adventures in computing hardware, one of the problems I have repeatedly encountered is the need to generate bit encodings of symbolic instructions. When working with "well known" architectures with existing tooling support, one can usually find an "off the shelf" assembler either in the GCC collection or elsewhere. However when you're working against a custom instruction set you're on your own for tooling.

So lets invent yet another assembler!

An assembler is simply a program that describes how to construct a binary instruction encoding from a set of symbolic human-writable mnemonics representing data manipulating directives to computing hardware. add, sub[tract], mul[tiply] and jump are good examples of such mnemonics. Each operation which a given "machine" (hardware/software interpreter) will accept is typically specified as a sequence of bits and bit encoded fields (sometimes abbreviated in octal or hexadecimal notation). So Batbridge for instance specifies that the add instruction is the bit string 110000tttttaaaaabbbbbiiiiiiiiiii where the prefix 110000 or 0x30 indicates the addition operation, the bits named t encode a number t∈[0..30] being the register to which the result of the addition will be stored, the bits named a subject to the same constraints as t name the register from which the "left" operand will be drawn and the b bits name the "right" operand same as the t and a operands. i is an arbitrary signed 10 bit integer (sign is the 11th bit).

Sitting down with a specification, you or I could construct a set of functions from values appropriately constrained to bit encoded instructions as laid out in an instruction set specification. However in doing such an assembler implementation, you will discover that you are simply encoding in the programming language of your choice a translation table laid out in full by the instruction set documentation. However realizing that the assembler which we are constructing by hand represents a printer to the bytecode machine's reader, we can think of bytecodes as a language in the same sense that any other set of strings and productions constitutes a language.

A Theory of Bytecode Machines

It happens that we have good formal theories about languages and their structures. If we stop and squint, we can see that the individual opcodes are analogous to terminals or verbs. If we allow ourselves to model opcodes as words in a language, then a program (a production of words) is a sentence for which we can establish a grammar. For instance a ELF formatted program could be considered to be a sequence of blocks of instructions (verbs) defined in terms of words (opcodes) and formatted with an appropriate header/footer (the ELF file structure). As these are all linguistic constructs, a program capable of generating these values must be a direct production of the specification which lays out these productions in the same way that a lex/yacc lexer parser pair is a mechanical production of the lexing and parsing rules which define a computer language.

This leads to the idea that, just as we now rarely write parsers and lexers preferring to derive them automatically from specifications of the languages we wish them to process since a as suggested above a bytecode is simply a language on bit strings rather than ASCII or Unicode characters the consequent that we can use the same class of tools to generate assemblers and disassemblers as we use to generate parsers and printers should be obvious. We just need to construct formal languages describing the instructions our machines accept.

An Assembler Generator

Just as we have theories about "interesting" sets of languages and the parser techniques which may be applied to efficiently recognize and process them, so too rather than being able to automatically assemble arbitrary programs to arbitrary bytecodes we must first recognize that as a bytecode is a language on bit strings and as there exist languages which require a Turing machine to recognize in addition to uncomputable languages consequently one could invent an "interesting" (but I will argue in a minute silly) bytecode for which no assembler or recognizer can be automatically generated.

So what features define "interesting" bytecodes? In hardware terms as I discuss elsewhere memories and lookup tables are just about the most expensive thing you can build in terms of chip area and time. As a result, the overwhelming majority of computer architectures are not what I shall call "micro-programmable", that is users cannot give meaning to new sequences of bits and rather interact with the computer by chaining together a few bit-verbs or opcodes which are built into the chip. There do exist such micro-programmable computers, FPGAs, but they tend to be special purpose prototyping machines and are not in wide usage compared to fixed instruction set machines.

Another characteristic of "interesting" bytecodes is in the name, bytes or words. Many years have gone by since bit addressed machines were last built with commercial intent. Instead commercial machines and off the shelf memories tend to address "blocks" of bits or "lines" being strings a power of two in length which may be divided into substrings smaller powers of two in length as the user may see fit at no additional hardware complexity cost by simply selecting bits through selecting predefined subdivisions of lines known as "words" or using bit masking and shifting to select strings smaller than a single word. Strings larger than a single word must be dealt with by using multiple operations on words.

This theory of a fixed word size, fixed instruction set machine characterizes the overwhelming majority of instruction sets and machines designed and built by Intel, ARM, Symbolics, Thinking Machines, and others over the past decades due mainly to the ease of implementation and efficiency of such designs.

So what does this theory about a "useful" bytecode machine buy us? It implies that we can describe a "useful" machine as a bitstring length denoting word size. Endianness must also be considered. Then given an enumeration of the various opcodes and how to generate them as bitstrings, you can completely describe an "interesting" instruction set as characterized above.

Assemblers are simply programs that deal with translating instructions set mnemonics writable by a programmer or more often than not a compiler to bytecodes which a machine can execute. The details of computing label addresses are entirely for the most part independent of the specific architecture in so much as they are determined by properties of the specific target bytecode described above. This suggests that there must exist a function

(assemble p ← Platform c ← Codes)

Which makes sense if you consider a single target assembler to be the partial application of such an assembler to a platform.

Talk is cheap, show me the code

First things first, we need a way to represent an "interesting" instruction set.

(deftype Architecture
    "A structure representing an \"interesting\" bytecode architecture
    from which a parser or a generator can be computed."
  [word-width         ;; ← Num, how many bits in a word
   pointer-resolution ;; ← Num, how many bits forwards does *(x+1) go
   opcodes            ;; ← Map [Memonic → Opcode], opcode formatting rules
   ])

So what do individual opcodes look like? We're gonna need some bit vector formatting engine to do real code bytecode generation with.

I will define an opcode to be a bitstring of fixed length (this is important, there are instruction sets with variable length "opcodes" however as the different length forms have different operational semantics I shall define them to be different operations and declare the instruction set architects silly people) composed of a sequence of concatenated parameters or fields and constants which may be either signed or unsigned. Fields are named or anonymous. The set of named field names in a single opcode must have no repetitions. We want to be able to write something like the following

;; IFLT 0x20  100000 _____ aaaaa bbbbb iiiiiiiiiii
;; execute the next instruction IFF (< a b)
(opcode :iflt
  (const-field            :icode 6  0x20)
  (enforced-const-field   :_     5  0)
  (parameter-field        :a     5  register?)
  (parameter-field        :b     5  register?)
  (signed-parameter-field :i     11 literal?))

So in this example instruction from Batbridge, we define an instruction as the ordered bit string where the bits [0..5] are named :icode and fixed with the constant value of 0x20. We then have an anonymous field 5 bits long taking the bits [6..10] which we force to have the value of 0 since they are defined to be ignored by the instruction set. We then have a pair of (unsigned) 5-bit registers, which must satisfy some guard predicate ensuring that the value in question fits within the value domain which the instruction set allows for registers. Finally we have a signed field 11 bits long, which again is guarded with a value domain predicate. This notation encodes the bit format string, as well as the value domains for each element of the bit format string in a form that can be easily inspected by either an assembler or a disassembler program.

So what does that expression build out to as a data structure?

{:width 32,
 :params (:_ :a :b :i),
 :fields ({:offset 26,
           :width 6,
           :name :icode,
           :type :const,
           :value 35}
          {:offset 21,
           :width 5,
           :name :_,
           :type :enforced-const,
           :value 0}
          {:offset 16,
           :width 5,
           :name :a,
           :type :unsigned-field,
           :pred #<batbridge$register_QMARK_ toothpick.isa.batbridge$register_QMARK_@4bd4095>}
          {:offset 11,
           :width 5,
           :name :b,
           :type :unsigned-field,
           :pred #<batbridge$register_QMARK_ toothpick.isa.batbridge$register_QMARK_@4bd4095>}
          {:offset 0,
           :width 11,
           :name :i,
           :type :signed-field,
           :pred #<batbridge$literal_QMARK_ toothpick.isa.batbridge$literal_QMARK_@38334886>})}

Given this datastructure, and a sequence (opcode . params), we can trivially compute a map (zipmap (:params (get isa opcode)) params), and then fold right over the fields of the instruction computing each field then shifting and anding it into place in the bit vector to construct the bit encoding of the instruction.

As implemented this is a little fragile because it assumes that the computer running the assembler assembler has a larger or equal integer size than the target, as attempting to construct an int aka bit vector that's too large will yield... interesting issues. Future fixes are to use a real sequence of bits, or to rewrite this logic in terms of multiplication by two and addition.

Deriving a parser for this structure is also (fairly) straightforwards, one simply can simply generate an alternation matcher on bitstring matchers matching the individual instructions. "interesting" ISAs tend to put the opcode specification in the "low" bits of each instruction word so as a we can simply generate a jump table by masking on the least significant bits but that's not implemented yet.

Also this representation falls into jneen's type tagging pitfall, but this is old code and implementation details thereof so I'll forgive myself and fix it later.

We can then define an "architecture" by constructing a composite map as above having a word size and pointer interval specification and a map from keywords naming instructions to many such instruction maps.

So, we can describe an instruction stream as a sequence of instructions, and we can assemble individual instructions by interpreting from their representation above, so if we wanted to encode a sample program

[[:label :top]
 [:op [:add [:param [:register 0]]
            [:param [:const 0]]
            [:param [:const 1]]]]
 [:label 1]
 [:op [:add [:param [:register 1]]
            [:param [:const 0]]
            [:param [:const 2]]]]
 [:label 2]
 [:op [:add [:param [:register 0]]
            [:param [:register 0]]
            [:param [:register 1]]]]
 [:label 3]]

We can simply foldr a label location computing operation since all of our opcodes are defined to be fixed length even if they are abstractly variable length, then in a second pass we assemble each instruction against the label location context and the instruction set, giving us a sequence of bytecodes. Done! ∎

This is essentially my Toothpick project, which seeks to provide exactly this meta-assembler facility and has already proven its worth to me in automating the generation of assemblers for the various toy architectures with which I've worked at school. It's not done yet and as noted above it needs some spit and shoeshine but I think it represents a remarkably simple and powerful idea.

The real power of this approach is that, extended, this can enable the automatic generation of LLVM backends! If an instruction specification were to include its operational semantics written in terms of LLVM instructions one could imagine a trivial derived LLVM backed which sequentially scans emitted LLVM assembly matching out the platform translation of the instruction "at point" and then assembling the resulting instruction stream as above. Since LLVM already provides a massively portable C compiler infrastructure, this means that given only a formal platform specification one could construct a mediocre C compiler only from the platform specification.

To take this idea even further, one could also imagine generating a platform simulator from either symbolic instructions or bit encoded instructions from only the augmented ISA with semantics instructions.

Generating muxing logic in Verilog or VHDL should also be trivial from this structure.

Generate all the things! Correctness by specification & construction! Toy architectures for everyone!

^d

牛: Some motivation

Here is some of my motivation for Oxlang. I'm sorry it isn't shorter, I gave up on copy editing to turn in. This was originally a personal email and appears almost unaltered.

Clojure is awesome. It is grammatically simple with a classical lispy syntax, and its cheap immutable data structures enable equational reasoning about most problems and solution tools.

My top bugbears & goals:

  1. Contributor culture. The contribution process should be as open, friendly and low friction as possible see Guy Steele's comments, on growing a better language from one with warts through acceptance of contributions. The GHC team has a "3% rule", in that if a change improves the language or the type checker and comes at a cost of less than a 3% slowdown it has a high probability of being accepted and acceptance/rejection is publicly debated ala LKML. Commentary on the Clojure process may be inferred.

  2. Formalism. Formalism gives portabity and is the only real rout thereto. Parser spec. Evaluator spec. Kernel standard library spec with a features list. This enables portability of code and data structures. Keeps us out of the JVM centric mess that Clojure itself is close to right now.

  3. Documentation. I built Grimoire (Introduction post) because frankly I think that the existing Clojure documentation is lacking and there is not interest from the Core team in improving it as evidenced by the consistent rejection of patches which change doc strings beyond simple typo fixes. Goes with formalism requirements.

  4. Hide the host. That formalism should include the call stack and exceptions. Use the host under the hood sure, but don't leak the host for the most part. To paraphrase Paul Phillips, Just because Java or Clojure jumped off a bridge doesn't mean we can't have nice things like Data.Ord. This likely means making interop deliberately second class. Possible even explicitly generated from introspection but not the common case. Musings on this may be found here.

  5. Types (and protocols/interfaces) matter. Clojure and much of the Cognitect crew seem to pretend that they don't. It feels like not a week goes by without someone in IRC or twitter posting a function that checks the run time type of some value against some implementation type of Clojure's core. Aphyr obligingly posted today's, clojure.core.incubator/sequable, I have a few. Enable static reasoning rather than pretending it doesn't matter and that nobody does it. Edit: From the horse's mouth. Bake in type inference and something akin to prismatic/schema.

  6. Purify the language implementation. Practice what you preach. Oxlang/Kiss's immutable environments absolutely required. Death to &env, *ns* and mutable namespaces, do reloading by dependency tracking.

  7. Give thought to being self-hosting. It's not essential off the bat, but it makes porting & compatability easier in the long run. GHC's bootstrap & rebuild itself process beccons and Toccata plays the same trick.

  8. Give more thought to the module system & versioning culture. I want to be able to depend on "something implementing these functions in the namespace clojure.core then require those and alias them in as such", not "Dear lein, this exact version of this library, thanks no ranges can't trust anyone. Okay lets just load crap up". The language should enforce versioning b/c programmers sure as hell can't be trusted with it. Note that this goes with my first point to lower the cost(s) of churn in the standard library.

  9. Tooling. I'm with Chas in his comments here. As hinted in 8, the language needs to provide as much support as possible for editors, for REPLs, for packages, for optimizing, for testing. These are the things that make or break the UX of the language and if the UX isn't good people will just go back to writing node.js or whatever the latest flavor of the week is. Tooling is hard and takes time, but it must be a goal. I also went over this in 'On Future Languages'.

All of these (save the parser and evaluator spec) are goals towards which work can be accumulated over time once the semantics of the language are agreed upon. Tooling, types, speed, correctness analysis can all be done if changes (growth) is cheap and the goals are agreed upon.

I think that given these changes it would be pretty easy to design embedded and memory restricted targets for "Clojure", and I'll admit that the goal of writing an OS with this language is near to my heart but for the most part I want to improve the platform independent experience since most of the code I write isn't platform bound it's exploratory scratch work.

^d

Oxcart going forwards

When I last wrote about Oxcart work pretty much went on hiatus due to my return to school. As there has been some recent interest in the status of Lean Clojure overall I thought I'd take the opportunity to review the state of Oxcart and the plan for Oxcart going forwards.

Oxcart and Clojure

The static linking approach and lack of run time dynamism found in Oxcart is explicitly at odds with the philosophy of core Clojure. Where Clojure was designed to enable live development and makes performance sacrifices to enable such development as discussed here, Oxcart attempts to offer the complement set of trade offs. Oxcart is intended as a pre-deployment static compiler designed to take a working application and to the greatest extent possible wring more performance out of unchanged (but restricted) Clojure as PyPy does for Python. As Oxcart explicitly avoids the dynamic bindings which Clojure embraces, Alex Miller, the Clojure Community Manager, has repeatedly stated that he expects to see little cross pollination from Oxcart and related work to Clojure itself.

This would be all well and good, were it not for the existing behavior of Clojure's clojure.lang.RT class. As currently implemented in Clojure 1.6 and 1.7, RT uses its <initc> method to compile the following resources with clojure.lang.Compiler.

  • "clojure/core"
  • "clojure/core_proxy"
  • "clojure/core_print"
  • "clojure/genclass"
  • "clojure/core_deftype"
  • "clojure/core/protocols"
  • "clojure/gvec"
  • "clojure/instant"
  • "clojure/uuid"

These represent about 10799 lines of code, all of which could easily be statically compiled and most importantly tree shaken ahead of time by Oxcart or another tool rather than being loaded at boot time. This also means that the unavoidable booting of Clojure itself from source can easily dominate loading user programs especially after static compilation to raw classes. A quick benchmark on my machine shows that booting a Clojure 1.6 instance, loading a ns containing only a -main that only prints "hello world" takes ~2.8 seconds from source compared to ~2.5 seconds booting the same program compiled with Oxcart suggesting that the cost of booting Clojure is the shared ~2.5 second boot time. This is the test.hello benchmark in Oxcart's demos.

$ git clone git@github.com:oxlang/oxcart.git &&\
  cd oxcart &&\
  git checkout 0.1.2 &&\
  bash bench.sh test.hello
Running Clojure 1.6.0 compiled test.hello....
Hello, World!

real    0m1.369s
user    0m3.117s
sys     0m0.083s
Oxcart compiling test.hello....
Running Oxcart compiled test.hello....
Hello, World!

real    0m1.212s
user    0m2.487s
sys     0m0.073s

Then there's the test.load benchmark. This benchmark as-is pushes credulity because it compiles 502 functions of which only the -main which uses none of the other 501 will be invoked. This reflects more on program loading time than on the loading time of "clojure/core", but I still think instructive in the costs of boot time compilation, showing a ~7s boot time for Clojure compared to a ~2.5s boot time for Oxcart. As arbitrary slowdowns from macroexpansions which Thread/sleep would be entirely possible I consider this program within the bounds of "fairness".

A Fork in the Road

There are two solutions to this limitation, and both of them involve changing the behavior of Clojure itself. The first is my proposed lib-clojure refactor. Partitioning Clojure is a bit extreme, and in toying with the proposed RTUtil changes over here I've found that they work quite nicely even with a monolithic Clojure artifact. Unfortunately there seems to be little interest from Clojure's Core team (as judged via Alex's communications over the last few months) in these specific changes or in the static compilation approach to reducing the deployment overhead of Clojure programs. The second is to fork Clojure and then make lib-clojure changes which solves the problem of convincing Core that lib-clojure is a good idea but brings its own suite of problems.

Oxcart was intended to be my undergraduate thesis work. While the 16-25% speedup previously reported is impressive, Oxcart does nothing novel or even interesting under the hood. It only performs four real program transformations: lambda lifting, two kinds of static call site linking and tree shaking. While I suppose impressive for an undergrad, this project also leaves a lot on the table in terms of potential utility due to its inability to alter RT's unfortunate loading behavior. I also think there is low hanging fruit in doing unreachable form elimination and effect analysis, probably enough that Oxcart as-is would not be "complete" even were its emitter more stable.

I'm reluctant to simply fork Clojure, mainly because I don't think that the changes I've been kicking about for lib-clojure actually add anything to Clojure as a language. If I were to fork Clojure, it'd be for Oxlang which actually seeks to make major changes to Clojure not just tweak some plumbing. But writing a language so I can write a compiler is frankly silly so that's not high on the options list. The worst part of this is that forking Clojure makes everything about using Oxcart harder. Now you have dependencies at build time (all of "stock" Clojure) that don't exist at deployment time (my "hacked" Clojure). Whatever hack that requires either winds up complicating everyone's project.clj or in an otherwise uncalled for leiningen plugin just like lein-skummet. Tooling needs to be able to get around this too when every library you'd want to use explicitly requires [org.clojure/clojure ...] which totally goes away once Oxcart emits the bits you need and throws the rest out. Most of all I don't want to maintain a fork for feature parity as time goes on. However I also don't see any other a way to get around RT's existing behavior since the RTUtil refactor touches almost every java file in Clojure.

Flaws in the Stone

Oxcart itself also needs a bunch of work. While I think that Nicola has done an awesome job with tools.analyzer and tools.emitter.jvm I'm presently convinced that while it's fine for a naive emitter (what TEJVM is), it's a sub-optimal substrate for a whole program representation and for whole program transforms.

Consider renaming a local symbol. In the LLVM compiler infrastructure, "locals" and other program entities are represented as mutable nodes to which references are held by clients (say call sites or use sites). A rename is then simply an update in place on the node to be changed. All clients see the change with no change in state. This makes replacements, renames and so forth constant time updates. Unfortunately due to the program model used by tools.analyzer and tools.emitter.jvm, such efficient updates are not possible. Instead most rewrites degenerate into worst case traversals of the entire program AST when they could be much more limited in scope. Cutaway is one experiment in this direction, but it at best approximates what clojure.core.logic.pldb is capable of. I hope that over Christmas I'll have time to play with using pldb to store, search and rewrite a "flattened" form of tools.analyzer ASTs.

Oxcart is out of date with tools.emitter.jvm and tools.analyzer. This shouldn't be hard to fix, but I just haven't kept up with Nicola's ongoing work over the course of the last semester. This will probably get done over Christmas as well.

Oxcart doesn't support a bunch of stuff. As of right now, defmulti, defmethod, deftype, defprotocol, proxy, extend-type and extend-protocol aren't supported. I'm pretty sure all of these actually work, or could easily work, they just didn't get done in the GSoC time frame.

Finally and I think this is the thing that's really blocking me from working on Oxcart: it can't compile clojure.core anyway. This is a huge failing on my part in terms of emitter completeness, but it's a moot point because even if I can compile clojure.core with Oxcart RT is gonna load it anyway at boot time. I also suspect that this is an incompleteness in the project as a whole which probably makes it an unacceptable thesis submission although I haven't spoken with my adviser about it yet.

The Endgame

As of right now I think it's fair to call Oxcart abandoned. I don't think it's a worthwhile investment of my time to build and maintain a language fork that doesn't have to be a fork. I talked with Alexander, one of the clojure-android developers and a fellow GSoC Lean Clojure student/researcher about this stuff and the agreement we reached was that until 1.7 is released there's no way that the lib-clojure changes will even get considered and that the most productive thing we can do as community members is probably to wait for 1.8 planning and then try to sell lib-clojure and related cleanup work on the basis of enabling clojure-android and lean clojure/Oxcart. Practically speaking in terms of my time however, if it's going to be a few months until 1.7 and then a year until 1.8, that only gives leaves me my last semester of college to work on Oxcart against an official version of Clojure that can really support it. If that's what it takes to do Oxcart I'll likely just find a different thesis project or plan on graduating without a thesis.

(with-plug
  That said, serious interest in Oxcart as a deployment tool or another
  contributor would probably be enough to push me over the futility hump
  of dealing with a Clojure fork and get Oxcart rolling.)

^d

The Future of the LispM

This is also addressed to @lojikil towards whom I have threatened to write this post on multiple occasions. It is expected to piss off loper-os. Above all, it represents my opinions backed up only by a cursory research effort. You have been warned.

Background Of Historical LispMs

LispMs occupy a transitional period of computing industry history between time shared mainframes and the single user workstations that would become today's desktop computers. Prior to John McCarthy's invention of time sharing, all computers had been single program machines which multiple users shared via batch scheduling. The introduction of time sharing opened the way to exploring how to make computers useful interactively for many users sitting at terminals. Under bach processing, users would present fully formed programs written by hand or with little tool assistance to computer techs who would run the programs and return printed output.

The concept of time sharing and interactivity lead to the development of more dynamic programming environments including the Read Eval Print Loop. However at the same time, the falling costs of integrated circuits (and DoD funding for research into the design of such VLSI machines) began to put the first workstations, single user machines featuring large memories, within the financial reach of well funded AI laboratories which previously dependent on time shared mainframes.

In order to improve interactive performance, Richard Greenblatt and Thomas Knight developed the cons machine, the first of the MIT Lisp machines. Priced at ~$50,000 in 1970s dollars cons and the successor cadr machine were single user workstations built primarily it seems to address the restrictions on memory and memory performance and subsequent detrimental impact on general program performance faced on time shared systems due to the large memory footprint of Lisp programs and page thrashing. Hardware/microcode support was added for maintaining the relatively complex Lisp environment structures, and for many of the primitive lookup/load/store operations on cons cell structures that characterize a traditional Lisp implementation's data access pattern. This combination of machine support and microcoding enabled Lisp "compilers" to be far simpler and offloaded the responsibility for maintaining complicated structures such as the environment from the compiler to the microcode system. Best of all as the microcode closely corresponded relatively directly to Lisp, on the rare occasions that truly writing microcode for say device drivers was called for doing so was easy because users could transparently invoke and define microcode from their native Lisp environment.

Seeing a business opportunity, in the early '80s a number of MIT AI lab researchers departed and founded Symbolics Inc. and Lisp Machines Inc. for the purpose of building and selling Knight machine derived Lisp workstations. Ultimately however, Lisp machines did fall out of favor during the AI winter(s) and lost to what we now call the personal computer. I have seen the argument made that they were too big-ticket and missed the mass market as a result. Thus our story really begins.

Of Modern Hardware

In the years since the demise of the dedicated LispMs, Intel and the other major chip manufacturers have progressed from single cycle to pipelined, cache accelerated, superscalar and out of order machines. Knowledge of these designs is assumed. I suggest my earlier introduction series to these topics if you are not already familiar with them. At a high level however, each step of this progression trades off chip area, transistors (complexity) and power budget to wring more instruction level parallelism out of existing programs.

The Symbolics 3600 technical report from '83 claims a best case cycle time of 180ns (~5MHz) and a worst case of 250ns (4MHz). Arbitrary reads from main memory are priced at 600ns (3x the cost of a single instruction). This low (by modern standards) slowdown between the processor and memory meant that one could write programs that did lots and lots of random memory access and still have reasonable performance expectations. However, as a stack machine architecture, there is no opportunity for the instruction level parallelism exploited by modern architectures as the exact state of the stack which is depended on by every instruction changes with every instruction. This forces all program execution to wait during slow operations like main memory reads, division and floating point math due to the sequential data dependency on the stack. This is a fundamental architectural failure common to all stack machines and skirted by modern software stack machines like the JVM by compiling literal stack instructions to a register machine in order to recover instruction level parallelism.

Modern processors claim cycle times of 0.3̅ns for a 3GHz machine, often running multiple instructions per cycle. Unfortunately while processors core clocks have gotten faster since the days of the 3600, memories have not fundamentally improved. On modern hardware, 100ns for a random main memory read seems to be more or less normal. This represents a 27x speedup in processor speed mismatched with a 6x speedup in main memory latency for an effective 4.5x slowdown on memory reads. Now this is admittedly worst case memory latency. Thanks to the L1 and L2 cache layers, memory read times of 1ns to 3ns are not uncommon meaning that for cache efficient programs it is quite reasonable to expect less than a 2ns or 6 cycles for memory reads hitting in the L1 or L2 cache.

I mention this, because techniques like cdr coding serve to bring the average case of list access and construction down from their worst case behavior of generating massive heap fragmentation (and thus defeating caching and prefetching) towards the behavior of nicely caching sequential data layouts (classical arrays) typically via tree like structures or SRFI-101. While traditional linked list structures due to potential heap fragmentation provide potentially atrocious cache locality and this cripple the performance of modern cache accelerated machines, more modern datastructures can simultaneously provide the traditional cons list pattern while improving cache locality and thus performance characteristics. Guy Steele himself has even come out arguing this point.

Without literal implementation as a stack machine traversing linked lists and providing special hardware support for binding and soforth, the LispM instantly looses much of the vaunted simplicity which makes it attractive to program directly and becomes simply another register machine in the mold of modern ARM or Intel machines with strange long running microcoded instructions reminiscent more of the VAXen than of modern RISC inspired processors. In short due to the many performance limitations of linked lists as data structures we as an industry will never again build machines as the original LispMs were for the sole purpose of traversing and manipulating linked list data structures. Modern hardware simply offers better performance with more general applicability.

Of Modern Languages

We've also come a long way in language design and implementation. Compilers, once slow, have gotten faster and smarter. Virtual machines like the JVM, JavaScript and the CLR are becoming widely used deployment targets. The ML and Haskell families of languages have introduced us to concepts of real abstract types and abstract effects which can be used to build programs coupled only by the abstract properties of the data being consumed, generated and effects produced. Type inference is even making such fancy behavior manageable by mere mortals, while providing language implementations with more and more information with which to perform both program level optimization and micro-optimization not possible in traditional naive lisps.

Of LispMs Future

While we'll never build a hardware LispM again, I suspect that we will see LispM like systems return one day in the not too distant future. Not as hardware, but as software or a virtual machine designed to run atop existing and entirely adequate modern hardware. Now in 10 years someone may make a commercial hardware LispM at which point I will be happy to eat my words, but as of right now I don't see it happening.

The JVM and the .net CLR have proven to be amazing platforms. While their allocation requirements perhaps prohibit their use for implementing operating systems and driver code (not that people haven't tried this sort of thing) they do offer excellent and standardized platforms. It is my personal belief that, by leveraging the flexibility that Lisp dialects have for both data driven DSLs and macro DSLs that it would be more or less trivial to implement an operating system using a DSL for generating platform specific assembly. As the "kernel" OS is implemented in the final "host" LispM language editing the kernel is possible albeit difficult due to the inherent difficulties of hot swapping operating systems.

Add a JIT (no this isn't easy), preferably implemented in the same assembler DSL as the host operating system, and the stage is set for building a LispM environment as a virtual machine running atop modern hardware and using modern JIT to achieve reasonable performance characteristics. From this vantage the way is clear to implementing the rest of the operating system and user land in a fully fledged lisp with good reloading and introspection support atop this kernel of a JIT and enough of an OS to provide memory and process management.

This is, I think, an entirely reasonable and practical project for a smallish (~3 or fewer) team and a single target architecture (Intel). There was a project, Dream, an r4rs x86 assembler dsl in which an r4rs interpreter and operating system were implemented. It booted and worked more or less fine, and while it lacked polish I think it serves as an excellent proof of concept that such a self hosting Lisp (assembler os runtime) triple is viable. I don't argue that such an implementation will be trivially portable to a wide variety of target hardware because as experience shows porting nominally portable Linux, FreeBSD or FreeRTOS is actually quite hard and porting a JIT can at best be as hard due to tight coupling with the specific behavior of the target machine but this is why we specify an abstract machine and then let implementations deal with the mess as the JVM does.

I also think that Lisp as a language family could stand to learn some new tricks if someone were to make the effort to build a full Lispy OS/VM thing.

Clojure is awesome. I've written a boatload of it. It's an amazingly simple and elegant tool, and its persistent/immutable datastructures are absolutely worth stealing as is its concept of hygenic macros via symbol qualification and probably the concept of protocols/interfaces which Clojure kinda inherits from its primary host Java.

Racket is also awesome although I can't claim to have written a bunch of it. Its approach to static compilation, strong support for documentation and examples via Scribble, its pattern matching and typechecking facilities are totally worth stealing.

Shen is interesting if only due to its built in Prolog engine enabling search for arbitrary proofs with regards to arbitrary program properties. While criticized by the Haskell community for disregarding completeness being able to write and search for proofs with regards to arbitrary properties of programs not just types is I think an amazingly powerful one albeit an active research area.

But is it a LispM?

There are some people who will rail against this vision of mine that the "OS" is as low as we need to try and push a language. To my mind, the advantages of pushing a language to the hardware level are scant enough as argued above that it simply does not justify the investment or the delay. Even the OS may be too low to push a language and that while the adventure of building an OS with a language to host the language ala Oberon because while integrating the language with the OS does offer maximal conceptual unity across the combined platform and provide a huge leap in ease of integrating pipelines of different applications it also forcibly discards programs not written in HOST_LANG for HOST_PLATFORM. This is a problem if only because the Unix model of computing as implemented on Linux with the help of the GNU user land is second only to the mach hiding inside of Mac OS X in terms of apparent developer adoption. Developers booting the next Dream for the first time won't find familiar text editors or tools. It will be a brave new world.

Maybe that's a good thing. Maybe we can escape the legacy of Unix with untyped byte streams and instead revive some of Plan 9 with a basis in lispy goodness and with more types everywhere. Maybe there's even a place in this for algebraic effects. Maybe we can join urbit in looking towards a functional, fully netwoked future. Maybe. If only we can look past the obsolete machines and operating systems of yesteryear with which we seem content to circlejerk and move forwards with appropriate context and vision.

Or maybe we'll just be stuck with Linux, Unix, OpenSSL and the rest of our somewhat suspect tower of legacy code out of sheer inertia and continue getting caught up in language of the month rages out of collective ADD.

 We have not wings, we cannot soar;
       But we have feet to scale and climb
 By slow degrees, by more and more,
       The cloudy summits of our time.

 The mighty pyramids of stone
       That wedge-like cleave the desert airs,
 When nearer seen, and better known,
       Are but gigantic flights of stairs.

 The distant mountains, that uprear
       Their solid bastions to the skies,
 Are crossed by pathways, that appear
       As we to higher levels rise.

 The heights by great men reached and kept
       Were not attained by sudden flight,
 But they, while their companions slept,
       Were toiling upward in the night.

~ Longfellow

^d