Intermediate Abstraction

This talk was presented at the Bay Area Clojure meetup, hosted by Funding Circle. These notes are presented here with thanks to the organizers and attendees.

1 Preface

Hey folks! For those of you who don't know me, I'm Reid McKenzie. I've been writing Clojure on and off for about four years now. These days I work for Twitter as a Site Reliability Engineer.

Twitter Site Reliability is really small. We're not even 150 engineers tasked with ensuring that all the infrastructure and services required to support the business stay online. It's clear that as the business continues to grow, we in SRE can't continue be traditional reactive system operators. We must be software engineers developing automation and designing systems capable of intervention-less recovery.

Achieving quality in software engineering means we must develop an understanding of how to exercise our powers of abstraction, and that we appreciate and managing the complexity with which we are surrounded. This talk presents some formal and philosophical concepts intended to elevate the ways we approach problem solving which will I hope be useful to you.

2 Context

Programming is a young profession all things told.

If we trace the roots of our profession to Lovelace as she is the first to have written of a machine which could be used for general purpose and perhaps program or modify itself then our profession such as it is dates to the 1830s. Just shy of the two century mark.

Date Who Event
1928 Hilbert Entscheidungsproblem
1933 Godel, Herbrand μ-recursive functions (simple arithmetic)
1936 Church λ calculus, models Entscheidungsproblem with reduction
  Turing Turing machine, models Entscheidungsproblem with halting
1941 Zuse First programmable, general purpose computer
194{5,6} Von Neumann Von Neumann single memory architecture
1950 USGOV, Banks Computers become common place for census & data processing
1956 Bell Labs Transistors are invented
1970 IBM, Commodore First transistorized PCs come to the market
1976 Wozniak Apple I
2017   Us, here, today, floundering variously

Computation as a science is grounded in the concepts of evaluation and analysis. The science is in the theoretical limits of what a machine could do. The question of how to achieve something on a given machine is the province of software engineering.

Software engineering is a younger discipline still. We've only really been writing software since the '50s (Excepting Lovelace's programs which never had a machine), so call it 60 years of recognizable programming during the course of which huge shifts have occurred in the industry.

In retrospect one can only wonder that those first machines worked at all, at least sometimes. The overwhelming problem was to get and keep the machine in working order.

– Dijkstra "The Humble Programmer" EWD340 (1972)

The problem with software engineering is, well, for the most part we never get to do it. The profession of programming as adopted in industry is rooted in the now seemingly long tradition of just getting the machine to work at all and then "debugging" it which has carried since the first programs in the '50s. The operations tradition from which SRE programs are usually developed is particularly guilty of this.

Rather than have a concrete body of science to reference and with which to inform decisions, every working programmer develops an aesthetic taste for what seems to them like the right way to solve problems; what kept the computer or systems of computers in working order and solved the problem best. The problem with taste is that it is deeply subjective. It reflects our personal experiences in education, what javascript frameworks we've played with on the side and how we did it at $LASTJOB. When tastes conflict, they can't inform each-other except with great difficulty because there's no underlying theory which can be used to relate and moderate them.

The question is how do we earn the E in our titles. And the answer is not to continue artistically operating the systems we have or continuing to do what seems to work. We must develop and communicate formal understandings for the tools and tactics we use.

3 Complexity

Simple; Simplex; To have but a single strand, to be one.

Complex; To be woven from several strands.

Complect (archaic); To plait together.

Easy; from the French alsie meaning comfortable and tranquil. Not difficult, requiring no great labor or effort.

– Rich Hickey "Simple Made Easy" (2012)

Often, we miss-use simple to mean easy.

Ideally, we want to write code which is both simple and easy. This requires that we be concious both of complexity and of ergonomics. However we only rarely have objective critera for these two critical dimensions.

The fact that a perfectly reasonable engineer may consider a tool sufficient and easy while another may just as reasonably a tool baroque and overbearing is a core motivator for this talk. Sharing an objective metric of simplicity clarifies otherwise subjective questions and enables us to agree upon measurements of solution quality.

Thankfully there is prior art on the subject of attempting to estimate complexity. Karl Popper's Logic of Scientific Discovery builds a framework of formal logic with which to formalize the scientific process itself.

Popper bases his project upon falsifiability and corroboration through experiment. A hypothesis for Popper is a formal statement which is falsifiable because it implies testable outcomes. An experiment cannot confirm a hypothesis, because to do so would imply that no other experiment (state of the entire universe) could possibly falsify the hypothesis. However, we can fail to falsify the claims of the hypothesis, and we can even show that the experiment produced a state which corroborates the prediction(s) of the hypothesis.

Popper proposes that the complexity of a theory can be measured by reasoning about the set of states of the universe which would falsify the theory. These sets can be related to each other and relative complexity measured by establishing implication and subset relations, but this is far from enough to be a really working rule of measurement. Comparing infinities (sets of states of the universe) is hardly intuitive.

The good news is that there are ways we can do this!

Cyclomatic complexity (McCabe) measures the number of control paths through a program. This approach works, and even gives an obvious heuristic for recognizing complexity in the simple number of branches but it is uni-dimensional and doesn't capture the fact that our programs have and manipulate state.

Dataflow complexity is a related family of metrics which try to measure the possibility space of the data (state) in a program. Unfortunately, beyond giving an intuition for the idea that complexity grows with the memory footprint of a program there's not an efficient way to quantify or relate this kind of complexity.

We get into this mess of having to count states in order to talk about complexity because both of these complexity estimators use the model of a Von Neumann finite state machine machine whose states cannot easily be related to each other.

If we go back to Popper, if we have two logical reduction statements P and Q, we can estimate their relative complexity by simply comparing the number of degrees of freedom of each expression. Likewise using the μ calculus (simple arithmetic and reduction) or the λ calculus (function application and reduction) we instantly regain the property that we can understand the complexity of an expression or program merely by looking at the number of terms it involves and counting them.

This intuition is supported by cyclomatic complexity and dataflow complexity metrics because (for programs which halt under reduction) the complexity of a program written with reduction is, well, one. Each datum and program point is unique and occurs only once.

λx.xx λx.xx

→ λx.xx λx.xx

That this must be an estimate, not a precise metric as a divergent combinator such as ω will mess this whole thing right up. But per the halting problem there's not much we can do here so if we accept (as we must) that this is an incomplete analysis and restrict ourselves to the domain of terminating expressions we can still get a lot of utility out of this rule of thumb.

Now, we don't write purely functional programs, and when we do write programs which use functional tools and abstractions there's still global state lying around because we compute on Turing machines not graph reduction engines (those are really hard to build). How do we estimate complexity for real programs then?

It has been suggested that there is some kind of law of nature telling us that the amount of intellectual effort needed grows with the square of program length. But, thank goodness, no one has been able to prove this law. And this is because it need not be true.

– EWD, "The Humble Programmer"

There is room for reasonable disagreement here, but I'd propose a very simple heuristic for estimating complexity; the product of the following properties of a program or program segment.

  1. 1 + Number of input parameters
  2. 1 + Number of branches
  3. 1 + Number of back-edges
  4. 1 + Amount of REACHED CAPTURED state which is accessed (maybe 2x cost of a parameter)
  5. 1 + Amount of REACHED CAPTURED state which is modified (maybe more than 2x the cost of a parameter)
(defn map [f coll]
  (when coll
      (cons (f (first coll))
            (map f (rest coll))))))

That is, a function which accepts more parameters and takes many branches using them is more complex than a function which accepts a function and a few parameters as parameters and merely delegates to the argument function. The real branching behavior of such an applied higher order function may be complex, but the function itself is simple. It captures little state and does little directly.

(defn log*
  "Attempts to log a message, either directly or via an agent; does not check if
  the level is enabled.
  For performance reasons, an agent will only be used when invoked within a
  running transaction, and only for logging levels specified by
  *tx-agent-levels*. This allows those entries to only be written once the
  transaction commits, and are discarded if it is retried or aborted.  As
  corollary, other levels (e.g., :debug, :error) will be written even from
  failed transactions though at the cost of repeat messages during retries.
  One can override the above by setting *force* to :direct or :agent; all
  subsequent writes will be direct or via an agent, respectively."
  [logger level throwable message]
  (if (case *force*
        :agent  true
        :direct false
        (and (clojure.lang.LockingTransaction/isRunning)
             (*tx-agent-levels* level)))
    (send-off *logging-agent*
              (fn [_#] (impl/write! logger level throwable message)))
    (impl/write! logger level throwable message))


(declare ^:dynamic *logger-factory*)


(defmacro log
  "Evaluates and logs a message only if the specified level is enabled. See log*
  for more details."
  ([logger-factory logger-ns level throwable message]
   `(let [logger# (impl/get-logger ~logger-factory ~logger-ns)]
      (if (impl/enabled? logger# ~level)
        (log* logger# ~level ~throwable ~message)))))

An expression which does little, but does so using values which it draws from global state such as a configuration value may still be simple, but it is not so simple as a function which accepts that same configuration structure directly as a parameter. This complexity becomes evident when testing, as a test for the simple function merely requires calling the simple function with the appropriate configuration value where testing the globally configured function requires manipulating the state of the entire application so that the shared mutable configuration state conforms to the tests requirements.

We can see that the log macro is actually quite complex becuase it reaches a significant amount of global state despite the fact that the log macro itself doesn't actually do that much. The log macro has to check in a global mutable registry of namespaces to logger implementations for a logger, check the whether that logger is enabled and then do the logging, potentially manufacturing a new logger using the factory from global state if it has to.

(defn mongo!
  "Creates a Mongo object and sets the default database.
      Does not support replica sets, and will be deprecated in future
      releases.  Please use 'make-connection' in combination with
      'with-mongo' or 'set-connection!' instead.
       Keyword arguments include:
       :host -> defaults to localhost
       :port -> defaults to 27017
       :db   -> defaults to nil (you'll have to set it anyway, might as well do it now.)"
  {:arglists '([:db ? :host "localhost" :port 27017])}
  [& {:keys [db host port]
      :or   {db nil host "localhost" port 27017}}]
  (set-connection! (make-connection db :host host :port port))

A function which changes state may be complex compared to a function which produces no effects and returns a value, but it introduces far more whole program complexity if global state is modified especially if it is modified many times.

This supports the intuition that perhaps sorting a list received as an argument in place doesn't add so much whole program complexity because the effect is restricted in scope to the caller and wherever the list to be modified was sourced from whereas reading and manipulating the ERRNO global value in a C program may directly impact any number of other program behaviors. Such modification defeats referential transparency, and forces us to use sequential logics at the level of the whole program to achieve understanding.

We use multiplication rather than addition to reflect that really what we're trying to approximate is the VOLUME of a multi-dimensional state space, which is measured by the product of the length of the sides, not their sum.

With apologies to Dijkstra, I note that it is quite possible for a program's complexity by this metric alone to grow at least in proportion to the square of the program's length. However there's also still plenty of room at the bottom for simple programs to achieve near-linear complexity growth.

4 Abstraction

We all know that the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called “abstraction”; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

– EWD, "The Humble Programmer"

What is an abstraction?

Abstractions can be considered to consist of a model, interface and an environment.

The model of an abstraction is the set of operational semantics which it exposes to a user. For instance a queue or channel as an abstraction provides the model that a user writes to it and a consumer reads from it. Reads may be in some order such as FIFO or LIFO, or un-ordered with respect to writes. A bounded queue may provide the additional semantics that, if the writer outpaces the reader it can only get so far ahead before the queue put operation blocks the writer. A Concurrent Sequantial Processes queue provides the operational semantics that when a write occurs the reader is awakened and so only the reader or the writer is ever operating at once.

The interface is the set of operations which the abstraction provides to the user. For instance the channel can have elements enqueued and dequeued. Perhaps the queue can also be closed at one end, denoting that either the source or the destination is finished whereupon the other end will eventually (or immediately) also close.

Abstractions don't exist in a vacuum. They have expectations which they may demand of the outside world (externalities) such as restrictions on how they are used or in what context they are employed. Everything that the model omits is ultimately an assumption about the environment.

It may be helpful to note that the environment is the dual of the model. Anything that is not part of the model must be part of the environment, and by growing the model we can shrink dependency on the environment.

This is deeply significant, in that mismatch between environments and reality results in friction for the user, and tends to be resolved not by abandoning the abstraction but by humans adapting to the restrictions imposed upon them by inadequate tools.

Abstractions may be compared - one said to be simpler than the other - on the basis of the size of the model, interface and expectations imposed on the environment. The model after all is a space of states, the interface a set of functions above, and externalities may be either a set of propositions or a space of states as the two are equivalent.

The ergonomics of abstractions may also be compared - one said to be more consistent than the other - on the basis of the consistency of the names and structure of the abstraction's interface with each-other and predictability of the relationship between the interface, the model and its externalities.

4.1 Examples of Abstraction

Per the Dijkstra quote above, abstraction is all about changing or choosing semantics. I'll give a treatment of linguistic abstraction shortly, but by naming things we can build semantic levels - languages - which convey what we want.

We can abstract over control flow by using higher order functions which provide for instance conditional execution giving us a tool with which we can talk about for instance conditionally applying a transformation.

We can abstract over values by choosing types or decomposing values into their components, using aggregate logical values and references which enable us to refer to parts of our data.

However it is not abstractive to perform computation. When we "compute" (reduce) an expression, we discard information irrecoverably. For instance an average is not meaningfully an abstraction over a time series. It is a datom, the product of some other irrecoverable data under a function. By computing this new value, we've given ourselves an aggregate view which no longer carries the same semantics and cannot meaningfully be considered to be equivalent to its source.

4.2 Limits of Abstractions

Unfortunately, abstractions do have limits.

Traditional types which you may be familiar with, your int and char, are really just raw presentations of what the physical machine can do with a given value. Rarely however do these presentations of machine capabilities map to the senses which are meaningful to us. They are the poorest possible of abstractions - none at all.

#include <stdio.h>
#include <stdint.h>

#define EVAL(...) printf("%s: %d\n", #__VA_ARGS__, __VA_ARGS__)

int main(int argc, char** argv) {
  return 0;
(int32_t)(1<<31)-1: 2147483647
(int32_t)(~(1<<31)): 2147483647
(int32_t)(1<<31): -2147483648

How many times in Java have you actually wanted precisely a value with 232 states of which 231-1 states are said to represent positive numbers, 231 states represent negative numbers and the zeroed bit string represents … zero. And arithmetic wraps around, flipping the sign should we excede the bounds of [-231, …, +231-1]. We also know that while we have a vector of bits … somewhere and one of them is a sign, and we can count on the 32nd bit being the sign we can't actually make endianness assumptions about the bytes in our integer(s). So much for bitmasking tricks.

This abstraction of an int type provides addition, subtraction, multiplication, division, absolute value and of course comparison. All of which we associate with the concept of an integer. Furthermore this abstraction supplies, by specifying model of the machine representation, all the operations we customarily associate with a vector of bits - and, or, xor, shifting, inversion.

If all we want is addition and comparison, the rest of this behavior is not just semantic baggage, it adds complexity. What if I want to count to 231? I have to be aware that this abstraction's model says I'll get -231+1, not the value I expect. Or perhaps an out of band signal that an IntegerOverflowException occurred, which constitutes another cross-cutting concern because now the abstraction implies a system of exceptions with which I must be familiar.

For a large domain of problems, this abstraction does well enough. However we must be constantly aware of its failure modes. We as professionals bear a constant cognitive cost in conforming ourselves to the limits which this abstraction presents to us.

It should be recognized here that Haskell, Lisp(s) and other languages which provide arbitrary precision integer types out of the box capture a simpler programming model by default. A model which has more implementation complexity to be sure and consequently expects more of the environment for instance the presence of a memory manager but the interface is smaller and less thorny.

4.3 Inheritance, Composition & Re-use

Software quality is defined first by solving the business needs of the here and now. If software or strategy or any other tool doesn't do that, it's worthless. Software is a form of physical capital in the same sense as a steam engine or a factory. A factory which can be re-tooled to produce articles of a different design is more valuable than a factory which will require full replacement should the product change. Likewise an engine or assembly line which must be significantly reworked in order to add capacity or change task is less valuable than a design which naturally features a pattern along which it can be changed without significant alteration.

Object Oriented Programming (that is, programming via encapsulation and inheritance) promised that it would improve among other things software reusability when it broke into the scene.

The problem with inheritance as a strategy in practice is that any tool which wants to participate in some abstraction is forced to conform to the expectations of the interface established by the parent which the child wishes to participate in. The interface must accumulate.

Furthermore, reuse through inheritance has the problem of capturing mutable state. An extension or wrapper class must be intimately familiar with the state of the class it wraps and the expectations it may make. The model accumulates.

In "Simple Made Easy" Rich gave a good treatment of this, pointing out that code partitioning doesn't imply decoupling and pointing to Interface Oriented Programming is the response to this problem, because interfaces only bring with them their direct set of expectations.

To be precise about what's happening here - the model of inheritence oriented programming forces us to accreet interfaces and externalities. Thankfully this is not an essential property of abstraction, merely a property of this particular technique.

(defn mapcat [f & colls]
  [f & colls]
  (apply concat (apply map f colls)))

When we take two abstractions and put them together, we say that we've composed them. For instance two functions compose together to make a third. The map function composes with the concatenate function to give us mapcat. map lets us apply a function over many values, and concat allows us to join sequences together. We can thus define mapcat to be a function which lets us join the results of applying a function which produces sequences to many values. The composite model of mapcat requires only the additional constraint that f: a → [b], that is the function to be mapped produces results which can be concatenated.

(def evens #(keep even? %1))

We can build abstractions which have smaller interfaces. For instance if we take a function of a configuration and a datom and we partially apply it with a configuration, we now have a fixed function of a datom. Its model is smaller - we know how it will behave because we've specified whatever configuration(s) the larger base model depends on and the interface is smaller - it consists only of the datom to manipulate.

Building a small wrapper around a large Java API would be a good example of such a composite abstraction with a smaller interface.

We can also build composite abstractions which traverse the model and externality trade-off. For instance we can build abstractions which simplify the set of externalities by providing input or state validation where previously we had unchecked expecatations. This expands the model since now the expectations are explicitly modeled and checked. We can also reduce the complexity of the model by choosing to make assumptions.

4.4 Naming

Credit: Zachary Tellman, Elements of Clojure (unreleased).

Linguistic abstraction in the form of naming is the most fundamental tool available to us as both artisans and engineers.

Names being the tools we use to understand our tools, choosing good names (whatever that means) is of primary importance. But what characterizes a "good" name?

Frege suggests that a name consists of three properties - the sign, the textual representation of the name, the referent being the entity referred to by the sign, and finally the sense, being the properties ascribed to the referent.

The traditional example is that Phosphorous (morning star) and Hesperus (evening star) are both Greek celestial bodies which we now understand to both name the planet Venus. The sign and referent are according to Frege insufficient to understand these two terms which are prima facie synonyms because they don't actually interchange.

A good name must satisfy two properties - it must be narrow and consistent.

A narrow name excludes things it cannot represent; its sense is only as broad as the context of its use requires. For instance if the only expectation to be made of a reference is that it will be a mapping, then map or its contraction m is a perfectly acceptable name if that is the only sense in which the referent is used.

A consistent name shares the same sense for a referent as is used for that referent in other related contexts. To continue the example of a mapping, to subsequently sign the same referent as dict elsewhere would be less than ideal both because it's a broader name which implies a specific kind of mapping and thus no longer shares the same sense. A simple mapping m may only imply clojure.lang.ILookup, whereas hashmap implies java.util.HashMap and side-effectful update. Naming and naming conventions are tremendously important in the context of dynamically checked systems wherein names largely depend on their sense to communicate the type or at least intent of the use of the referent.

Lets consider some examples (adapted from Zach's fine book)

;; This is a Very Bad expression.
;; The left referent is very broad, when we clearly use it in a narrow sense. It
;; should be more narrowly named.
;; The value we get out of our broad referent is named very narrowly. The single
;; string contains far too much sense.
(get m "sol-jupiter-callisto")

;; This is a better equivalent expression, because we communicate a more
;; appropriate sense for our previously over-broad sign, and we unpacked the
;; previously overly precise key into a set of more general structures each of
;; which communicates less sense.
(get-in starsystem ["jupiter" "callisto"])

;; Better yet capture the logical operation, the fact that Callisto is a moon of
;; Jupiter is an assumption in all of the above expressions.
(get-body-by-name system "callisto")

;; It would also be better to be completely precise about the sense of the name
;; Callisto by doing something like
(-> (get-system "sol")     ; This communicates that we want to get a "system" named sol
    (get-body "callisto")) ; And we care about the component "body" named callisto

;; This expression further communicates the sense that Callisto is a moon of Jupiter.
(-> (get-system "sol")
    (get-body "jupiter")
    (get-moon "callisto"))

;; Both of these two latter expressions are better because more of the relevant
;; sense of the name callisto is captured explicitly, rather than being implied
;; upon or expected of the seemingly broad mapping referent.

4.5 Achieving Abstraction

As suggested in the discussion of the consistency of signs, part of the problem here is that we're forced by the tools we use to project our sense directly upon the sign.

Types: A heresy static sense dynamic sense
static dispatch Haskell, C, Java Java (class casting)
dynamic dispatch C (dyn linking) Smalltalk, Python, Ruby

Static type systems provide a model with which we can talk about the sense we associate to a name by capturing at least some of the sense in which we intend to use the referent as a type and knowing that if we can determine a type (and the type must conform to a type of which we knew ahead of time) then we can begin to understand the behavior of our systems through the lense of the types we've assigned to our data.

Some systems allow us to use this information statically to check that at least some part of the sense or type is consistently respected in the use of a referent.

Some systems enable you to capture more sense than others, by encoding the sense statically in what we usually call a type.

Type systems can be compared in terms of how much sense they enable you to capture, and how difficult it is to do so.

Lets say that we wanted to count up from zero, to continue the example of positive integers from above. We could choose a sign with which to reference our value which indicates that it's an index such as the traditional i, or idx or even index depending on individual taste. But sometimes we want to play clever tricks with the fact that sometimes negative indexes are defined, or we want to be tricky and reach back to some location "before" in memory where we're currently indexed to and soforth. The choice of sign implies only sense, it cannot constrain the referent.

However, we can choose abstractions which let us do so. For instance, we could build an abstraction which only allows us to capture ℕ, the natural or counting numbers.

from collections import namedtuple
from functools import total_ordering

class N(namedtuple("N", ["value"])):
  """A numeric value constricted to N (the counting numbers)."""

  def __new__(cls, value):
    if value < 0 or isinstance(value, float):
      raise ValueError("N is constrained to [0, 1, ...)")
    return super(N, cls).__new__(cls, value)

  def __int__(self):
    return self.value

  def __add__(self, other):
    return N(int(self) + int(other))

  def __sub__(self, other):
    return N(int(self) - int(other))

  def __eq__(self, other):
    return int(self) == int(other)

  def __lt__(self, other):
    return int(self) < int(other)

This isn't so different from the full Haskell implementation of Numeric.Positive.

There are techniques which can be used to try and capture more sense. This technique is best known as "smart constructors". Rather restrict the sense to the sign, we've defined a type which can only take on values conforming to our desired sense.

The additional semantic restrictions we can impose on ourselves from the type free us from either having to check the sense ourselves over and over again which is the most verbose and fragile path, or from settling for the integer sense we get "for free" and hoping that it's enough.

This tactic can be applied to any domain where we want to give a type to a subset of some other space of values. For instance strings of a known format. It is just window dressing, but frequently it's enough that we can approximate the semantics we actually want.

To take another example, what if we were to try and make a type for a Start of Authority (SOA) version? We could use a simple string, after all that's what the SOA becomes when we lay it out in a zone file, and the only requirements we have of an SOA is that it increase to define an ordering among zone file versions.

Generally however, strings (even constrianed strings!) should be avoided. Text is the least structured form of data. Parsing is, even with the aid of parser combinators and regular expressions, difficult and error prone. It's far better to use fully realized data representations which can be rendered to strings than to keep data as a string and decode it.

Which carries more semantic meaning - the string "20170706001", or the tuple

from collections import namedtuple
from scanf import scanf  # 3rdparty

class SOA(namedtuple("SOA", ["year", "month", "day", "version"])):
  """A tuple capturing a DNS SOA format."""

  PATTERN = "%04d%02d%02d%03d"

  def __new__(cls, text):
    """Decode a SOA string, returning a SOA tuple"""
    return super(SOA, cls).__new__(cls, *scanf(text, self.PATTERN))

  def __str__(self):
    """Encode this SOA as a string"""
    return self.PATTERN % self  # self is an appropriate 4-tuple already

print SOA("20170706001")
# SOA(year=2017, month=7, day=6, version=1)

The tuple can be rendered to and parsed from text as required, and above all it captures the sense in which we use SOAs which leaving the SOA as a plain string does not. If strings are just passed through and not manipulated, then perhaps it's acceptable to keep them represented as strings and skip decoding but structures like this which capture more of the sense of our data should be preferred.

Unfortunately there's often lots of other sense we may care about - for instance that we have a number within the index bounds of some structure or that there exists a record in the database with an id primary key of this value for any possible value. Checking these senses statically may well be impossible. There could be a concurrent change to the database and a once valid identifier becomes dangling and soforth.

Static types in the Haskell sense of things help, they allow you to capture some of the relevant sense - enough in my experience that software development may be easier because the type system helps you especially when you need to change abstractions but thanks to the completeness theorem we know all too well that they can never check every property we may care about.

4.6 Utility Is Contextual


It's tempting to think that we can, through great skill or shear luck, discover perfect abstractions; Platonic ideal or Martian crystalline forms of a concept which satisfy all our needs for all time. The problem with this view is that different contexts have different needs. Many objects may generate the same shadow on the wall of Plato's cave.

For instance above I attacked the "stock" numeric abstractions which include hardware implementation details, suggesting that arbitrary precision math is a better default. I stand by that argument, but it must be recognized that there are occasions when we must make different tradeoffs in choosing our models.

Sometimes we need to interface with the machine. Sometimes we want to use (or author) bit vectors. Sometimes we don't have a memory manager and can't afford truely arbitrary precision arithmetic. Sometimes the performance degredation of operating on a double floating point number as opposed to a single or half width floating point number is meaningful and we must make a choice to increase complexity and sacrifice ergonomics in the name of utility. That some machine type is the shared interface of some other abstraction and that the externality of the machine's particular wall time performance relate to our context changes what is appropriate.

Just as we should exercise caution in choosing our names so that we don't choose overly narrow or broad names, so we must exercise caution in choosing our abstractions. The simplest possible abstraction with the fewest possible externalities may well not be the most appropriate to the domain, and worse still a simple abstraction is not likely to lend itself to change. Like other crystals, abstractions shatter or shear rather than bend from their natural shape.

Coupling of abstractions - allowing the tools you choose to share externalities and to expect one-another is dangerous because it means that even small deviation from the original requirements and context may prove cause for significant change with rippling consequences for your entire system.

5 Performance

The greatest performance improvement of all is when a system goes from not-working to working.

– Osterhaut "My Favorite Sayings"

Performance and efficiency generally are factors in the measurement of how well a tool solves or responds to the needs of the "user", whether they be a person, business or other entity.

A code breaking machine which can decode a daily message in 32 hours isn't sufficient. It can't decode messages fast enough for them to be relevant. Such a machine is not useless, but it doesn't solve the problem set for it adequately. Three such machines together will provide, at an 32 hour delay each, the previous day's message and perhaps a fourth or even pairs of machines may be supplied for reliability in order to provide a more-or less continuous feed of intelligence but it would still be yesterday's data today and delivered at great cost.

In this example we have an objective criterion of how much performance is "enough". The cracking machine must have a cracking rate of at least 1 msg per 24 hrs, and we have an objective measurement that we're only able able to crack at the rate of 1 msg per 32 hrs.

There are two primary approaches to achieving performance where it is required. The first approach is to simply do less work - either by reducing the problem using algorithmic optimization, or by reducing the size of inputs by firing customers or negotiating system scope. When optimization must occur, algorithmic or organizational optimization should be by far preferred in the interest of overall program simplicity with all its other virtues.

The second approach is to tune what work must be done to better suit the machinery we use to do it. This is what we usually mean when optimization comes up, although in general we should prefer to find ways to simplify and reduce the problem.

Two opinions about programming date from those days. … The one opinion was that a really competent programmer should be puzzle-minded and very fond of clever tricks; the other opinion was that programming was nothing more than optimizing the efficiency of the computational process, in one direction or the other.

– EWD, "The Humble Programmer"

To be clever means to be "sharp", "quick" and "skillful". The old English roots suggest a sense of "cutting" and "keenness". It would be cleverness to design a tool which does the difficult or seemingly impossible within resource constraints by leveraging intimate knowledge of the system.

This formulation of a "clever trick" implies that a really clever trick must also have a high complexity. This is the root of the trouble with optimizations of the second kind. Clever tricks while perhaps relevant in the moment are in the long term likely to prove troublesome.

Worse, clever tricks of this kind require specializing some part or all of your program so that it reflects performance considerations. This has the particular negative consequence of meaning that your code cannot be so simple and algorithmic anymore - it must be reflective of some state(s) or trick(s) used to achieve acceptable performance because performance is now a crosscutting concern which must be accounted for by your abstractions.

Frequently, reasoning from first principles is not enough to produce fully optimized code. Modern high performance sorting algorithms are absolutely fascinating, because while the algorithmic complexity may suggest that provably optimal algorithms such as mergesort or quicksort should be universally adopted it turns out there are frequently ways to cheat and do better. Modern collections libraries use adaptive sorting strategies which benefit from existing sortedness in the input collection and frequently contain multiple different sorting implementations which are applied to different sizes of collection.

Even apparently simple statements like "doing two string concatenations will be slower than doing one" are impossible to evaluate without a real effort at profiling. The JVM among other platforms recognize that string manipulation is "expensive" due to the need to resize and copy buffers. However, both the Oracle JDK and OpenJDK actually include several optimizations which detect code doing trivial string concatenations and silently replace it with equivalent code which uses the StringBuilder class which delays buffer resizing as long as possible.

If you want fast, start with comprehensible. You can't make it fast if you can't change it.

– Paul Phillips, "We're Doing It All Wrong" (2013)

Unless there are known constraints, such as here the hard throughput requirement of a message a day question of "is it fast enough?" and the statements "this isn't performant" or "we need to do this for performance" are totally meaningless. Worse, they detract from the goal which should be to ship a tool which serves a purpose and introduce artificial pressure to conform the choice of abstractions to arbitrary false goals.

6 Fin: Purism

Niklaus Wirth's work is notable superficially because Wirth is himself a prodigious hacker. He's responsible for, among other projects, no fewer than five separate programming languages and two operating systems. What's the trick? How does he do it?

In the implementation of the Modula-2 compiler, Wirth and his students used two metrics to estimate and track its quality over the course of its evolution. One was the simple line count of the program - a poor proxy for complexity but a data point none the less. Another was the time which it took the compiler to compile itself.

One of Wirth's students implemented an elegant hash table backed by binary trees, for use in mapping strings to symbols - to look up local variables and implement lexical scope. Doing so introduced a new, complex datastructure, increased the line count of the compiler and added at best no performance benefit for it turned out most scopes in real programs were small for which a simple association list could be faster. Wirth ripped the hash table out.

Wirth was also a great user of tools, such as Backus-Naur Form and parser generators for expressing precisely the meaning which he wished to ascribe to a given set of symbols and the notation which his languages should represent. Even in these, Wirth's ascetic attachment to simplicity is clear - all of his languages are simple and can be efficiently implemented without any clever tricks whatsoever.

Simplicity and quality aren't properties which code or systems get by osmosis or by grant of the gods. They are the product of deliberate effort by programmers to resist the entropic pressure applied by reality and customer requirements on systems and to in spite of that pressure carve out cohesive, appropriate abstractions which leave room for growth.

It would be equally absurd, then, to expect that in unjust, cowardly, and voluptuous action there should be a mean, an excess, and a deficiency; for at that rate there would be a mean of excess and of deficiency, an excess of excess, and a deficiency of deficiency. But as there is no excess and deficiency of temperance and courage because what is intermediate is in a sense an extreme, so too of the actions we have mentioned there is no mean nor any excess and deficiency, but however they are done they are wrong; for in general there is neither a mean of excess and deficiency, nor excess and deficiency of a mean.

– Aristotle, "Nichomachean Ethics", Book II, Ch. 6

In the "Nichomachean Ethics", Aristotle (or rather his student Plato writing in his master's name) develops a system of ethics on the basis that one must always choose the mean between excess and deficiency.

There is a fashion in the software community to be "pragmatic". That is, to be tempered or practical with respect to an attitude or policy. Unfortunately, if you temper temperance with an added dose of reality, you no longer have "pragmatism", you have a regression. Software as an industry is built on an incrementalism of "progress" which is informed by experience but largely unaware and often dismissive of the possibility space.

Before we can claim to be pragmatic, we must have a vision of the perfection from which we turn aside in practical interests. It is my opinion that the overwhelming majority of working programmers are not conscious of what could be; of how higher order tools could help them; of how the pains they feel are the fault of specific failures of their tools and are not essential.

My goal with this talk was to share with you some of the philosophy and purism which I carry with me and which I hope helps me to strive for more perfect solutions. I hope that it helps you too.


Techwriting 101

At Twitter we have regular employee training courses on a number of subjects. These are my notes from our class on writing technical documentation for developers.

What are docs

Anything from diagrams to stained bar napkins. Could be a book, could be a public website, could be an internal resource.

Docs are the way that experts re-visit subject domains. They're how your customers / other service owners read in on your system. They're how new hires will start orienting themselves to the Twitter environment.

Why docs

Google study: 1/3rd of all code searches are looking for examples or specific API information.

Django: 120Kl of English, 80KloC. If you haven't written your docs you haven't written your product.


Gonna go over three main sections...

  1. The secret to writing good docs (know thy audience)
  2. Writer's block
  3. Polishing (patterns & antipatterns)

1. Know Thy Audience

Who is the document you are preparing for? Other developers? Non-technical user? Yourself in six months? An SRE at 2 a.m. when things are on fire? People with no context (far distant team or new hires) and remotes who need to be able to find information without constantly standing on someone.

Common kinds of documents & audiences (think job to be done)

  1. Introductory material
  2. Runbooks
  3. Service docs
  4. Design docs
  5. Cookbooks
  6. Team pages
  7. Troubleshooting
  8. Style guides
  9. Policy pages
  10. Roadmaps

What do your readers know comming in?

Different sets of documentation serve different purposes. What knowledge do they have and what can you assume?

Most developer docs err on the side of assuming knowledge. Lots of possible reasons for this, but it's an antipattern to be aware of.

What's the takeaway your document should communicate?

Doesn't have to be 5pgf MLA format, but you should have a TL;DR or thesis. This helps avoid problems like the interminable FAQ.

Audience Creep

Know the scope of the doc. Often you'll incrementally expand the vision of the document until it tries to be everything to all people at which point it's kinda useless.

2. Writer's Block


A metaphor: activation energy. Anything you can do to lower the activation energy encourages the reaction to occur. Once you get into a project you're "in the flow" and it's easy to keep hacking/writing, and you want to do whatever you can to stay there.

Forcing functions: pomodoro / the most dangerous writing app etc. Lifehacks and soforth abound here.

Often the way to get started is to just braindump.

2½. The documentation lifecycle

  1. Rapid development - visual artists & cartoonists. You start with shapes and just start doodling, iteratively refining. Someone dumped a wall of text, someone else organized it, someone else added examples... etc.

  2. Feedback - People read it and comment. Someone else goes on-call and has comments / discovers things which aren't written down. Even comming back to it the next day helps, but really it's about getting someone with a new context someone else who doesn't have the same context to look at the code. This is why docbird has code review as part of the workflow, it ensures that someone else has to put eyes on the document. You've got to build this into your team's workflow.

  3. Maintenance - Documentation which isn't true to the live system is pretty much worthless.

3. Writing really super amazing docs!

Antipattern: docs that aren't docs

It's not documentation to tell someone to ask someone else. Guideposts of where to do discovery are helpful but they aren't sufficient.

Antipattern: nerdview

Defined by the Language Log crowd

Nerdview is the tendency for people with any kind of technical or domain knowledge to get hopelessly stuck in their own frame of reference

Imagine reserving a rental car inline, and being asked to enter a date in DD/MM/YYYY format and getting an error message "Please pick a date greater than today". Nobody says "greater than today". This sentence only makes sense when you're thinking about dates as integral values which you're gonna be comparing as in a computer system.

This is really just an instance of knowing your audience.

Antipattern: noun piles "crash blossoms" or "garden path sentences"

Just adding hyphenation to clarify qualification can help a lot.

Antipattern: Acandemese/technicalese

Don't slip into formalism just because you're writing docs.

Antipattern: slang

Non-english-first speakers exist. We employ lots of 'em. Know thy audience and try not to over-use slang if only for the purpose of clarity.

Antipattern: Buzzwords

Type-safe! Modular! Parameterized!

Sometimes these help, sometimes these just add volume. Be aware.

Antipattern: Acronymns

Define them! Did you know that we have TSA and ACID at Twitter neither of which mean what you think they mean in common outisde-twitter usage?

Antipattern: Birds


Pattern: Simplicity all the way down

Simple pages

10-20min ABSOLUTE LIMIT on a given page of docs. Some say 5-10min is better... it depends on what your use case is and how you expect your audience to read. Again think what is the job of the document and what kind of attention do you expect your reader to direct to it?

Simple sections

Sections should be single-topic and targeted.

"Tasks not tools". Title your section in terms of what problem your reader is likely trying to solve, becuase only rarely is your audience reading to lear rather than reading trying to solve a problem. Can you title your question as a question or a gerrund rather than as a spare noun?

Remember that while you may write your docs front-to-back people will tend to discover them via some search tooling, so you shouldn't be afraid of repetition and certainly not of linking back to things you've already written about.

Simple paragraphs

Don't be afraid of single sentence paragraphs.

Don't present information that's better rendered tabularly or as a diagram as prose just because that's what's convenient / easy to write from pick your editor of choice.

Simple Sentences

"Someone said that words are a lot of inflated money - the more of them you use the less each one is worth"

~ Malcolm Forbes

Simple Words

The beginning of knowledge is to call something by its proper name

~ Confucius

Coherence vs. Completeness

Your doc should hang together. You should be able to sum up topics. This fits naturally into the context of completeness.

Kinda naturally at odds with Coherence. Completeness is saying everything, and coherence is saying just enough or just what is needed. Same as #shipit vs #berigorous. What are the top three use cases for this doc? What are the next three? Can you hide complexity with links?

You can kinda only keep five things in your working memory at once… three is kinda this nice magical number. Linking is a great way to reference complexity without becomming massively monolithic.

Templates & metadata

Often times you'll open a document and kinda have no idea who the author was and what the status was and what the changelog was and so on and soforth.

By using templates which provide those empty headings/sections/fields it becomes easier to get it right and write all that down.

Templates also help you be consistent in the structure of your documentation, which means that users can predict and rely on the structure of your content which makes docs more approachable.

Correctness, Maintenance & Culture of Documentation

  • Docs have to be treated as a first class priority & are part of the product.
  • Products aren't shipped unless you have docs &c.
  • Teams recognize that docs take time and allocate time for building documentation.

Formatting & Variation

In lots of docs you have this "wall of text" antipattern. This is particularly bad in confluence because the editor is weak and lines are long.

Formatting helps make the high-level structure of the document easier to understand / visualize / work with.

Lists, boldface, italics and break out boxes help display structure tastefully. BUT DON'T OVERDO IT. If everything has a warning box, your users will ignore all of them and soforth.

asside: Overdoing it. If you can't simplify the techdocs, it's commonly a smoking gun that the thing you're documenting is too complex.

Edward Tuft's The Visual Display of Quantitative Data.

Pie charts are the worst. Bar charts and tables are better almost universally.

Closing bits

  • Examples! Write them! They should be as the user will use them!
  • Interleave code and prose! Interactive examples are awesome!
  • Permalinks! Twitter has its own nice internal permalinks system, but these can make it a lot easier to find and reference the document you want.
  • Use a prose linter! ( etc.)
  • Teach a class / make a video!

Chowderheads [2004]

Volker Armin Hemmann wrote:

if we are talking about people who are only able to install gentoo because of an automated, graphical installer, then we will get: a lot more 'bug reports' that are not ones. a lot more really really stupid questions (I wait for the day, someone asks, where he can find the gentoo-homepage), and no new developers (but a lot more work for the existing ones).

One might also imagine we'd get less questions on actually installing Gentoo and more on doing stuff with it. There I go again with my tricky logic.

All successful and useful projects get new people. It's a fact of life and frankly if you aren't, you're doing something wrong. That holds true from the Gentoo project to the Roman Empire. If you can not integrate new people successfully into your organization, it will fail.

Gentoo has in fact from the very start been about making things easier. Easier to pick the packages you want, easier to upgrade, easier to custom build packages, easier to update etc files, etc. Gentoo has even gone out of its way to make better how-tos and is known in the Linux community at large for having just about the most useful and friendly forums.

Gentoo can either continue extending the infrastructure to support the people being attracted to a damn useful distro. Or clowns like you can attempt to keep Gentoo all to yourself with Jim Crow style exclusionary tactics.



Puppet, I guess

So I have, or rather should say had, a problem.

At work, I'm partially responsible for managing a large puppet deployment. Puppet is largely configured in terms of providers packaging resources & templates which are used to lay down configuration files on disk to manage host behavior. It is after all a configuration mangement tool. And it works great for the problem domain of "I have lots of servers with lots of state to configure".

But I have dotfiles. The way that I manage dotfiles needs to capture the idea that I want to merge many different components together into the state of the given machine. Sounds like puppet...

But my dotfiles have to be self-bootstrapping, and they can't presume access to any particular master node. These two requirements are actually pretty firm, because whenever I bring up a new laptop or a new VPS it's basically clone down my dotfiles, run an install script in the repo and done.

GNU Stow has served me really well so far. It totally solved my initial dotfiles problem, which was "I want to just symlink a bunch of stuff out of a repo".

When I wanted to start organizing the repo, stow became less ideal and I wound up with a wrapper script around stow which had some notion of sets of repository delivered stow packages to install on a given host. And that worked for quite a while on my personal machines.

Then I had a problem at work. I use ~/.zsh/conf/* as a holding pen for a bunch of configuration files regarding variables, functions and general shell bringup. When I was only configuring my personal machine(s), stowk worked fine because I had the same settings everywhere. Stow installs a whole zsh package from the repo, which includes a .zsh and a .zsh/conf and you're all set.

The problem came when I realized that some of my configurations only apply to my work machine. Because I use a Mac at work, I have some extra stuff I need to do with the path and homebrew to get things... quite right and I don't want to burden my Linux install(s) with all that. A monolithic zsh package won't cut it anymore.

But Stow only deals with packages consisting of whole directories! It won't let you manage single files! Really what I want is not a stow which lays down symlink directories, but a tool which makes real directories and symlinks in files. That way I could write several packages, say zsh and work_zsh which would merge together into the ~/.zsh tree the way I want.

So lets start laying out files on disk.

function arrdem_installf () {
  # $1 is the logical name of the file to be installed
  # $2 is the absolute name of the file to be installed
  # $3 is the absolute name of its destination
  # Installs (links) a single file, debugging as appropriate

  if [ -n "$DEBUG" ]; then
      echo "[DBG] $1 -> $3"
  ln -s "$2" "$3"

export -f arrdem_installf

Okay so that's not hard. Just requires ln, makes appropriate use of string quoting and has support for a debugging mode. We're also gonna need a thing to lay out directories on disk.

function arrdem_installd () {
  # $1 is an un-normalized path which should be created (or cleared!)

  dir="$(echo $1 | sed 's/\.\///g')"
  if [ ! -d "$dir" ]; then
      if [ -n "$FORCE" ]; then
          rm -rf "$dir"

      mkdir -p "$dir"

export -f arrdem_installd

Okay so this function will accept an un-normalized path (think a/./b/c), normalize it (a/b/c) and ensure it exists. Because we're interacting with a file/directory tree which Stow used to own, it's entirely possible that there used to be a symlink where we want to put a directory. So we have to support a force mode wherein we'll blow that away.

Alright... so now let's write a stow alternative that does what we want.

function arrdem_stowf () {
  # $1 is the stow target dir
  # $2 is the name of the file to stow

  f="$(echo $2 | sed 's/\.\///g')" # Strip leading ./
  ABS="$(realpath $f)"

  if [ -h "$TGT" ] || [ -e "$TGT" ]; then
      if [ "$(realpath $TGT)" != "$ABS" ]; then
          if [ -n "$FORCE" ]; then
              if [ -n "$DEBUG" ]; then
                  echo "[DBG] Clobbering existing $ABS"
              rm "$TGT"
              arrdem_installf "$f" "$ABS" "$TGT"
            echo "[WARNING] $TGT already exists! Not replacing!"
            echo "          Would have been replaced with $ABS"
        if [ -n "$DEBUG" ]; then
            echo "[DEBUG] $TGT ($f) already installed, skipping"
    arrdem_installf "$f" "$ABS" "$TGT"

export -f arrdem_stowf

So if we just want to install a single file, we normalize the file and compute the path we want to install the file to. Now it's possible since this is a symlink based configuration management system that the target file already exists. The existing file could be a deal (old) symlink, or it could be a link we already placed on a previous run. We can use realpath to resolve symlink files to their paths, and determine whether we have a link that already does what we wanted to do. In the case of an existing file and $FORCE we'll clobber, otherwise we'll only install a new link if there isn't something there.

Great so this deals with installing files, assuming that we want to map from ./foo to ~/foo.

Now we can really write our stow, which will eat an entire directory full of files & subdirectories and emplace them all.

function arrdem_stow () {
  # $1 is the install target dir
  # $2 is the source package dir
  # Makes all required directories and links all required files to install a given package.

  # If a target directory doesn't exist, create it as a directory.
  # For each file in the source, create symlinks into the target directory.
  # This has the effect of creating merge mounts between multiple packages, which gnu stow doesn't
  # support.
  ( cd "$2"

    # Make all required directories if they don't exist
    # If force is set and something is already there blow it the fsck away
    find . -mindepth 1 \
         -type d \
         -exec bash -c 'arrdem_installd "$0/$1"' "$1" {} \;

    # Link in all files.
    # If the file already exists AND is a link to the thing we want to install, don't bother.
    # Else if the file already exists and isn't the thing we want to install and force is set, clobber
    # Else if the file already exists emit a warning
    # Else link the file in as appropriate
    # Note that this skips INSTALL, BUILD and README files
    find . -type f \
         -not -name "INSTALL" \
         -not -name "BUILD" \
         -not -name "README.*" \
         -exec bash -c 'arrdem_stowf "$0" "$1"' "$1" {} \;

export -f arrdem_stow

It turns out that the ONLY really bash safe way to support filenames and directory names containing arbitrary whitespace or other characters is to use find -exec bash, rather than parsing the output of find. If you try to parse find's output, you wind up having to designate some character as special and the delimeter. I thought that whitespace was a safe assumption and found out I was wrong, so wound up taking this approach of using the -exec option to construct recursive bash processes calling my exported functionss. Which is why I've been exporting everything all along.

So given ~ as the install target, and ./foo/ as the package to install, we'll cd into foo in a subshell (so we don't leap CWD state), find & arrdem_installd all the required directories. THen it will arrdem_stowf all the files, with a couple exceptions. README fiiles, BUILD and INSTALL files are exempted from this process so we don't litter ~ with a bunch of files which aren't logically a part of most packages.

Okay. So we can install packages. Great.

This lets me build out a tree like


Which will get the job done, but still leaves me with the problem of picking and choosing which packages to install on a given host. I can solve this problem by going full puppet, and defining a concept of a profile, which consists of requirements of other profiles or packages. When I ./ on a host, it's gonna try to install the profile named $(hostname) first, falling back to some default profile if I haven't built one out for the host yet.

So really the tree will look like


where requirements files will be special and let a profile list out the other profile(s) and package(s) which it should be installed with. This lets me say for instance that the profiles.d/work profile is defined to be profiles.d/home more the package work-zsh for instance. Or that rather, profiles.d/work is profiles.d/default more a bunch of stuff and profiles.d/home is entirely independent and includes configuration(s) like my Xmonad setup which are irrellevant to a Mac.

So first we need to be able to install something we consider to be a package

function install_package() {
  # $1 is the path of the package to install
  # Executes the package build script if present.
  # Then executes the install script if present, otherwise using arrdem_stow to install the built
  # package.

    echo "[INFO - install_package] installing $1"

    if [ -x "$1/BUILD" ]; then
        ( cd "$1";

    if [ -e "$1/INSTALL" ]; then
        ( cd "$1";
        arrdem_stow ~ "$1"

export -f install_package

Packages are directories which may contain the magical files README, BUILD and INSTALL. If there's a BUILD file, execute it before we try to install the package. This gives packages the opportunity to do host-specific setup, such as compiling fortune files with ctags.

The INSTALL script gives packages an escape hatch out of the default package instalation behavior, say installing OS packages rather than emplacing resources from this directory.

Otherwise, we just tread the directory as a normal stow package and install it.

Okay, so now we need to support profiles.

function install_profile() {
  # $1 is the path of the profile to install
  # Reads the requires file from the profile, installing all required profiles and packages, then
  # installs any packages in the profile itself.

  if [ -d "$1" ]; then
      echo "[INFO] installing $1"

      # Install requires

      if [ -e "$REQUIRES" ]; then
          cat $REQUIRES | while read -r require; do
              echo "[INFO] $require"
              case "$require" in
                      echo "[INFO - install_profile($1)] recursively installing profile $require"
                      install_profile "$require"

                      echo "[INFO - install_profile($1)] installing package $require"
                      install_package "$require"

      # Install the package(s)
      find "$1" \
           -maxdepth 1 \
           -mindepth 1 \
           -type d \
           -exec bash -c 'install_package "$0"' {} \;
    echo "[WARN] No such package $1!"

So if there is a directory with the given profile name, then if there is a requires file, pattern match profiles & packages out of the requires file & install them. For good measure, install any packages which may be included in the profile's directory.

Now we just need a main to drive all this.

function main() {
  # Normalize hostname

  HOSTNAME="$(hostname | tr '[:upper:]' '[:lower:]')"

  if [ -d "$HOST_DIR" ]; then
      # Install the host profile itself
      # It is expected that the host requires default explicitly (or transitively) rather than getting
      # it "for free".
      echo "[INFO - main] installing profile $HOST_DIR"
      install_profile "$HOST_DIR"
    # Otherwise just install the default profile
    echo "[INFO - main] installing fallback profile profile profiles.d/default"
    install_profile "profiles.d/default"


At this point we've built out a shell script which depends only on find, bash and realpath but can support some really complex behavior in terms of laying down user config files. As hinted above, this could install OS packages (or homebrew).

By making heavy use of foo.d directories, it becomes super easy to modularize configurations into lots of profiles & merge them together for emplacement.

Best of all in debug mode it becomes pretty easy to sort out what's comming from where with a grep, or you can just stat the emplaced symlin(s) which will give you a fully qualified path back to the resouces they alias.

Not bad for a one-day garbage puppet implementation.

The code for this monstrosity is available here as a gist , but comes with a disclaimer that it's a snapshot of the working state of my dotfiles repository as of this article's writing and may be suitable for no purpose including my own usage.


Last Mile Maintainership

So Daniel Compton (@danielwithmusic) is a good bloke. We've been co-conspirators on a number of projects at this point, and I just wanted to share a quick vignette before I pack it in for the night.

Almost a year ago, James Brennan (@jpb) was kind enough to offer up a pull request (#158) to the kibit tool which Daniel and I currently maintain. We're both relatively inactive maintainers all things told. Kibit largely does what it's supposed to do and is widely used for which neither of us can take much credit. We're stewards of an already successful tool.

Between Daniel's day job and my move out to San Francisco #158 just got lost in the shuffle. It's an awesome feature. It enables kibit to, using the excellent rewrite-clj library, automatically refactor your code for style. If kibit can find a "preferred" replacement expression, thanks to James's work #158 enabled kibit to make the replacement for you. While Daniel and I kinda just watched James pushed it to feature completeness and found a significant performance win which made it not just a compelling feature but fast enough that you'd want to use it.

Then a couple months passed. Daniel and I had other things to do and presumably so did James.

About 11 hours ago now, I saw a github notification about a new comment - "Feature: lein kibit fix or similar that applies all suggestions" (#177). Cody Canning (@ccann) was suggesting exactly the feature James had offered an implementation of.

At this point James' patch adding exactly this feature had sat idle for many months. Some other things had come in, been more active and been merged. James' changeset now had conflicts.

Following the github help docs for how to check out a pull request (spoiler: git fetch $UPSTREAM_REPO pull/ID/head:$NEW_LOCAL_BRANCHNAME) I had James' patches on my laptop in less than a minute. git merge immediately showed that there were two sources of conflict - the kibit driver namespace had had its namespace refactored for style and a docstring had been added to the main driver function which James' patches touched. The other was that dependencies had been bumped in the project.clj.

Fixing this took.... five minutes? Tops?

The test suite was clean and in 11 minutes Daniel merged my trivial patch to James' awesome work done and live.

The whole process of about 10 months was overwhelmingly waiting. James finished the patch in like four days (April 20 '16 - April 26 '16). Daniel and I were just bad maintainers at getting it shipped.

Were Daniel and I worse maintainers, we could have seen #177 come in and asked either Cody or James to update the patch. It would have taken maybe five minutes tops to write that mail and maybe it would have saved me 15 minutes and Daniel 5.

After months of waiting? Why?

I've written before about my own thoughts on code review after working in an organization which is high trust, high ownership and sometimes it feels high process anyway. In this case and I'm sorry to say almost a year late, I went by what I've come to believe - that reviewers should make an effort to take responsibility for merging code rather than requiring the primary author to do all the leg work.

Sure I could probably have pinged James or suckered Cody into writing that merge commit but why? What does that buy anybody? It was so, so easy to just take James' changes and merge myself rather than asking someone else for trivial revisions. And it makes for a better process for contributors. It's not their fault that your project has grown merge conflicts with their changes.

If there had been a huge conflict, or James' changes had seemed somehow deeply unreasonable it would have been a different story.

But going the last mile for your contributors is worthwhile