Architecture and Microarchitecture

From a recent paper review I wrote -

So @ztellman is writing a book The Elements of Clojure which would better be titled The Elements of Software. It’s almost entirely about exploring ideas about abstraction and how abstraction and engineering interact. Maybe worth perusing since this paper seems to hit a bunch of related ideas. To crib some of those:

Science is about dealing with timeless things. The lambda calculus has no engineering constraints. It’s a great substrate for proof, and takes some head standing to become an adequate substrate for doing “real” work because doing “real” work is characterized by trying to fit within a finite machine and finite time.

Engineering is about shoveling the constants around to optimize for how many transistors we can fab at what density and how many we can drive / cool at what power without melting the whole thing. That is, it’s entirely an exercise in factors which are far from timeless and thus not suitable for long lasting abstractions. Architectures like the Itanium, MIPS and TTAs you allude to are the final form of this kind of thinking. It may be momentarily convenient say for performance to expose, as in the MIPS, a pipeline without interlocks or the full physical register set, but as the paper conveys well this constitutes a fragile API which will prove limiting in the future when the engineering constraints which motivated those decisions change.

The bit you don't seem to hit on yet is what is architecture, and how do we do it? I suggest, still cribbing from Zach, that architecture both in software and in hardware is all about trying to find the perfect layer of indirection between the engineering constraints of the preset (time, memory, chip area, heat, etc.) and the universals/existentials that represent a stable target. That is, you’re trying to find what set of cost trade-offs you can presume are constant-ish for as long as you hope your design will be relevant. Will we still be monocore and care about single cache density? Will the NUMA gap continue to increase? Will we continue to be able to rely on speculation for performance wins, or will we switch to more communication based models of computation where privilage level context switches must become more expensive and communication must be efficient?

To cherry-pick your CAR / CDR example of an architecture feature which has lasted far too long; While the names are absurd and now long irrelevant the notion of the first/next pair has proven to be general, timeless representation for arbitrary graph structures. Which has been repeated time and time again silly names included even in the face of memory architecture [1] changes which render the original conception less and less appropriate.

Straight stack, thunk or SECD VMs follow hot on the heels of really trying not to assume anything. Maybe that's fine. Maybe as you hint at the end, an abstract register machine with some support for generic operations in the style of the Cray-1 which may be specialized or implemented with accelerators later is appropriate.

The central idea that this paper seems to be toeing is that as implementers with our eyes on what we’ve built our natural tendency when “specifying” is to write down the specific thing we’ve envisioned. From a cognitive laziness perspective it’s the natural thing to do. Taking a step back and realizing what’s essential versus what’s incidental is really goddamn difficult because their (our) job is to have our heads so far down in the microarchitecture that we’re counting clock cycles and cache lines in our heads before we even have a simulator. This sort of clock cycle counting - engineering and the failure of architecture - is what leads to the microarchitecture-as-architecture trap sketched above.

Dunno how much of this is useful without turning it into a philosophy paper, but it does seem like there’s plenty of room to hit on these issues a little more directly and be concrete about the possible advantages of abstract structures like the admittedly absurd x86 floating point stack unit.

牛: How static is enough?

Ox has been and will continue to be a long-haul project in part because it's a dumping ground of ideas for me, and in part because I don't desperately need it. There's plenty of bad in the software industry, but I'm able to swing the hammers that already exist. Ox isn't so much a hammer as it is an exercise in the ongoing search for what a better hammer would be.

Part of this search comes from my judgments of the tools I use on the daily and their failures. The most significant part however is how the decisions I've made in what prototypes I have built have shaken out.

It would have been easy enough to crank out an RⁿRS interpreter and call it Ox. It just wouldn't have added anything to the software ecosystem or solved any problems I actually had. In fact most of the problems I've tried in one way or another to approach in Ox are directed by my programming experience.

So lets start with the motivation and what I was trying to get at in the last cut I took at Ox.

Clojure is a nice language which does a good job at trying to provide sane defaults. Its data structures are immutable by default, it features destructuring - macro sugar for binding values in structures to locals, it interoperates nicely with the JVM and it has a namespace system which makes code organization fairly predictable.

Because Clojure uses a singe pass compiler and its compilation unit is a single form rather than a file/module or group thereof, it can be difficult to Clojure whose module structure reflects its logical structure. Cyclic dependencies between modules/namespaces are difficult to achieve and have poor ergonomics. In fact, cyclic dependencies within a single namespace are - if not difficult to achieve - then at least requires an effort which tends to prohibit their use.

This has a couple of consequences, well exemplified by the language implementation itself. The namespace clojure.core is a huge namespace containing many groups of related functions which would do just as well if not better in their own namespaces. Big namespaces which don't have a unifying concept are difficult to approach as a new user and difficult to document. Furthermore, they create artificial contention over the "good" names, and tend to lead to the use of prefixes.

Perhaps most interesting is the consequences the lack of circular dependency support has for the language's own implementation. The first several thousand lines of clojure.core serve to bootstrap the language using mainly JVM inter-operation to Java interfaces and the Java implementation of Clojure's various built-ins. Some of this bootstrap code is unavoidable, but parts of it could be avoided if the implementation was able to make forward use of symbols which weren't defined yet.

With explicit forward declarations in some cases this could be made possible today within a single namespace such as clojure.core. The most interesting use cases of circular dependencies however require circular dependencies across module boundaries. For instance what if we could have a clojure.core.seqs namespace containing all the operations related to the sequence abstraction, and so-forth. A large number of small namespaces would be more modular, easier to break out of the standard library into library packages where possible.

And after all, Ox is a pie in the sky project so what would it take?

Well - circular dependencies on functions are pretty easy. They don't have a data dependency on each-other at compile time; okay they do but letrec is easy to write and rewriting modules as letrec to support mutual function recursion is well understood and a traditional Scheme trick.

For instance if I had the nonsensical module

(define (f a b)
  (c a))

(define (g a b)
  (c b))

(define (c x)
  (+ x 1))

This could be implemented instead by re-writing it to the letrec form

(letrec ((f (lambda (a b) (c a)))
         (g (lambda (a b) (c b)))
         (c (lambda (x) (+ x 1))))
  ...)

Implementing circular dependencies Scheme style via letrec works because the top level special forms (define etc.) are well understood and can be re-written into letrec bindings as above.

The idea I had at the time was that REALLY the core problem here is that the evaluator can't figure out the dependency graph between functions and macros because we have an indeterminable number of un-expanded macros. A solution to this problem is to simply make the dependency graph statically visible.

In scheme, the above letrec conversion is trivial because the top level form define is well understood by the implementation and so can be rewritten into a form where circular references will function as desired. The take on this in #29 is to do so by restricting top level forms to a small set of forms which explicitly state what definitions they provide. For instance def, defn, deftype and soforth all explicitly name the definitions they create.

(ns example)

;; This provides the definition example/foo. `def` is well know, so
;; this form can be understood as a definition at read time without
;; any effort to evaluate the body or docstring just like `define`.
(def foo
  "Some docs"
  1)

;; Same as above, `defn` is well known and can be statically parsed
;; at least to the degree we need to understand that this defines
;; the symbol example/qux for dependency analysis
(defn qux
  "Some docs"
  ([x] "arity 1")
  ([x y] "arity 2")
  ([x y z & more] "arity 3+"))

;; `defmacro` is just `defn` with the macro flag and some tweaks to
;; accept implicit arguments like `&form` and `&env`.
(defmacro deftype
  ;; ...
  )

;; deftype, definterface and friends all nicely fit this pattern to
;; the extent one may choose to implement them as special forms.

This all works pretty well. The problem is what happens when a macro occurs at the top level? Doing macros this way doesn't work, especially when there are sequential dependencies between the macros and the functions in the module. While the macros can be re-written as letrec bindings, the semantics of macro expansion don't fit that way because the macros have to be expanded before the functions or (or the other macros!) using them become executable.

Consider something like this...

(deftype Foo "A foo record." [a b c])
;; We now have the functions `->Foo`, `Foo?` and some friends

(defn foo-or-bar [a]
  (or (Foo? a)   ;; Where does `Foo?` come from? Nothing at the top
                 ;; level trivially provides `Foo?`, but we have to
                 ;; find it in order to compile `foo-or-bar`.
      (Bar? a)))

I thought about this for a long time because I think there's a clear case for supporting mutually recursive definitions with the most generality possible. And the first solution I came up with is the one implemented in ox-lang/ox#29.

By defining a new top-level special form providing, users can just annotate their top level macros with the symbols they're expected to produce! Problem solved! This lets Ox determine what top level forms need to be evaluated to see of they provide macros and perform lazy loading by doing functionally free dependency analysis before any evaluation.

So for instance

(providing [->Foo, Foo?]
  (deftype Foo "A foo record." [a b c]))

(defn foo-or-bar [a]
  (or (Foo? a) ;; The reader understands `providing` and can determine
               ;; that to get the needed `Foo?` definition the
               ;; previous form needs to be evaluated.
      (Bar? a)))

The problem with this approach - and the reason I call it a mistake - is that it greatly complicates load time and neuters definition writing macros. As noted earlier, there's plenty of good reason to want to write macros that write definitions. However a good bit of the syntactic power of such macros is lost if you have to provide a list by hand of definitions the macros will create for you.

Frankly this made implementing code loading so complicated that what with trying to bootstrap Ox in Java I couldn't do it. This decision proved absolutely fatal to that particular run at the Ox project.

There are, now that I've seen the failures of this approach and had some hammock time, other ways to achieve this same end. For instance if you don't allow macros to write macros, you can play games where macro definitions are letrec rewritten as above, and the definitions of "normal" functions or expressions are delayed until forced by a macro expansion. This lets you play games whereby functions and macros may mutually recur so long as you can ground out a macroexpansion to some concrete form or function of forms that can be evaluated without unbounded mutual recursion between functions and macros. Which, honestly, is the best I think you can do.

The advantage of Clojure's single pass approach here is a huge complexity win which avoids all these considerations, and the fact that the Clojure read/evaluate/print loop has exactly the same structure as code loading.

Anyway. Languages are hard. That's what makes them fun.

More when I've got better ideas.

^d

Intermediate Abstraction

This talk was presented at the Bay Area Clojure meetup, hosted by Funding Circle. These notes are presented here with thanks to the organizers and attendees.

1 Preface

Hey folks! For those of you who don't know me, I'm Reid McKenzie. I've been writing Clojure on and off for about four years now. These days I work for Twitter as a Site Reliability Engineer.

Twitter Site Reliability is really small. We're not even 150 engineers tasked with ensuring that all the infrastructure and services required to support the business stay online. It's clear that as the business continues to grow, we in SRE can't continue be traditional reactive system operators. We must be software engineers developing automation and designing systems capable of intervention-less recovery.

Achieving quality in software engineering means we must develop an understanding of how to exercise our powers of abstraction, and that we appreciate and managing the complexity with which we are surrounded. This talk presents some formal and philosophical concepts intended to elevate the ways we approach problem solving which will I hope be useful to you.

2 Context

Programming is a young profession all things told.

If we trace the roots of our profession to Lovelace as she is the first to have written of a machine which could be used for general purpose and perhaps program or modify itself then our profession such as it is dates to the 1830s. Just shy of the two century mark.

Date Who Event
1928 Hilbert Entscheidungsproblem
1933 Godel, Herbrand μ-recursive functions (simple arithmetic)
1936 Church λ calculus, models Entscheidungsproblem with reduction
  Turing Turing machine, models Entscheidungsproblem with halting
1941 Zuse First programmable, general purpose computer
194{5,6} Von Neumann Von Neumann single memory architecture
   
1950 USGOV, Banks Computers become common place for census & data processing
1956 Bell Labs Transistors are invented
1970 IBM, Commodore First transistorized PCs come to the market
1976 Wozniak Apple I
   
2017   Us, here, today, floundering variously

Computation as a science is grounded in the concepts of evaluation and analysis. The science is in the theoretical limits of what a machine could do. The question of how to achieve something on a given machine is the province of software engineering.

Software engineering is a younger discipline still. We've only really been writing software since the '50s (Excepting Lovelace's programs which never had a machine), so call it 60 years of recognizable programming during the course of which huge shifts have occurred in the industry.

In retrospect one can only wonder that those first machines worked at all, at least sometimes. The overwhelming problem was to get and keep the machine in working order.

– Dijkstra "The Humble Programmer" EWD340 (1972)

The problem with software engineering is, well, for the most part we never get to do it. The profession of programming as adopted in industry is rooted in the now seemingly long tradition of just getting the machine to work at all and then "debugging" it which has carried since the first programs in the '50s. The operations tradition from which SRE programs are usually developed is particularly guilty of this.

Rather than have a concrete body of science to reference and with which to inform decisions, every working programmer develops an aesthetic taste for what seems to them like the right way to solve problems; what kept the computer or systems of computers in working order and solved the problem best. The problem with taste is that it is deeply subjective. It reflects our personal experiences in education, what javascript frameworks we've played with on the side and how we did it at $LASTJOB. When tastes conflict, they can't inform each-other except with great difficulty because there's no underlying theory which can be used to relate and moderate them.

The question is how do we earn the E in our titles. And the answer is not to continue artistically operating the systems we have or continuing to do what seems to work. We must develop and communicate formal understandings for the tools and tactics we use.

3 Complexity

Simple; Simplex; To have but a single strand, to be one.

Complex; To be woven from several strands.

Complect (archaic); To plait together.

Easy; from the French alsie meaning comfortable and tranquil. Not difficult, requiring no great labor or effort.

– Rich Hickey "Simple Made Easy" (2012)

Often, we miss-use simple to mean easy.

Ideally, we want to write code which is both simple and easy. This requires that we be concious both of complexity and of ergonomics. However we only rarely have objective critera for these two critical dimensions.

The fact that a perfectly reasonable engineer may consider a tool sufficient and easy while another may just as reasonably a tool baroque and overbearing is a core motivator for this talk. Sharing an objective metric of simplicity clarifies otherwise subjective questions and enables us to agree upon measurements of solution quality.

Thankfully there is prior art on the subject of attempting to estimate complexity. Karl Popper's Logic of Scientific Discovery builds a framework of formal logic with which to formalize the scientific process itself.

Popper bases his project upon falsifiability and corroboration through experiment. A hypothesis for Popper is a formal statement which is falsifiable because it implies testable outcomes. An experiment cannot confirm a hypothesis, because to do so would imply that no other experiment (state of the entire universe) could possibly falsify the hypothesis. However, we can fail to falsify the claims of the hypothesis, and we can even show that the experiment produced a state which corroborates the prediction(s) of the hypothesis.

Popper proposes that the complexity of a theory can be measured by reasoning about the set of states of the universe which would falsify the theory. These sets can be related to each other and relative complexity measured by establishing implication and subset relations, but this is far from enough to be a really working rule of measurement. Comparing infinities (sets of states of the universe) is hardly intuitive.

The good news is that there are ways we can do this!

Cyclomatic complexity (McCabe) measures the number of control paths through a program. This approach works, and even gives an obvious heuristic for recognizing complexity in the simple number of branches but it is uni-dimensional and doesn't capture the fact that our programs have and manipulate state.

Dataflow complexity is a related family of metrics which try to measure the possibility space of the data (state) in a program. Unfortunately, beyond giving an intuition for the idea that complexity grows with the memory footprint of a program there's not an efficient way to quantify or relate this kind of complexity.

We get into this mess of having to count states in order to talk about complexity because both of these complexity estimators use the model of a Von Neumann finite state machine machine whose states cannot easily be related to each other.

If we go back to Popper, if we have two logical reduction statements P and Q, we can estimate their relative complexity by simply comparing the number of degrees of freedom of each expression. Likewise using the μ calculus (simple arithmetic and reduction) or the λ calculus (function application and reduction) we instantly regain the property that we can understand the complexity of an expression or program merely by looking at the number of terms it involves and counting them.

This intuition is supported by cyclomatic complexity and dataflow complexity metrics because (for programs which halt under reduction) the complexity of a program written with reduction is, well, one. Each datum and program point is unique and occurs only once.

λx.xx λx.xx

→ λx.xx λx.xx

That this must be an estimate, not a precise metric as a divergent combinator such as ω will mess this whole thing right up. But per the halting problem there's not much we can do here so if we accept (as we must) that this is an incomplete analysis and restrict ourselves to the domain of terminating expressions we can still get a lot of utility out of this rule of thumb.

Now, we don't write purely functional programs, and when we do write programs which use functional tools and abstractions there's still global state lying around because we compute on Turing machines not graph reduction engines (those are really hard to build). How do we estimate complexity for real programs then?

It has been suggested that there is some kind of law of nature telling us that the amount of intellectual effort needed grows with the square of program length. But, thank goodness, no one has been able to prove this law. And this is because it need not be true.

– EWD, "The Humble Programmer"

There is room for reasonable disagreement here, but I'd propose a very simple heuristic for estimating complexity; the product of the following properties of a program or program segment.

  1. 1 + Number of input parameters
  2. 1 + Number of branches
  3. 1 + Number of back-edges
  4. 1 + Amount of REACHED CAPTURED state which is accessed (maybe 2x cost of a parameter)
  5. 1 + Amount of REACHED CAPTURED state which is modified (maybe more than 2x the cost of a parameter)
(defn map [f coll]
  (when coll
    (lazy-seq
      (cons (f (first coll))
            (map f (rest coll))))))

That is, a function which accepts more parameters and takes many branches using them is more complex than a function which accepts a function and a few parameters as parameters and merely delegates to the argument function. The real branching behavior of such an applied higher order function may be complex, but the function itself is simple. It captures little state and does little directly.

(defn log*
  "Attempts to log a message, either directly or via an agent; does not check if
  the level is enabled.
  For performance reasons, an agent will only be used when invoked within a
  running transaction, and only for logging levels specified by
  *tx-agent-levels*. This allows those entries to only be written once the
  transaction commits, and are discarded if it is retried or aborted.  As
  corollary, other levels (e.g., :debug, :error) will be written even from
  failed transactions though at the cost of repeat messages during retries.
  One can override the above by setting *force* to :direct or :agent; all
  subsequent writes will be direct or via an agent, respectively."
  [logger level throwable message]
  (if (case *force*
        :agent  true
        :direct false
        (and (clojure.lang.LockingTransaction/isRunning)
             (*tx-agent-levels* level)))
    (send-off *logging-agent*
              (fn [_#] (impl/write! logger level throwable message)))
    (impl/write! logger level throwable message))
  nil)

...

(declare ^:dynamic *logger-factory*)

...

(defmacro log
  "Evaluates and logs a message only if the specified level is enabled. See log*
  for more details."
  ...
  ([logger-factory logger-ns level throwable message]
   `(let [logger# (impl/get-logger ~logger-factory ~logger-ns)]
      (if (impl/enabled? logger# ~level)
        (log* logger# ~level ~throwable ~message)))))

An expression which does little, but does so using values which it draws from global state such as a configuration value may still be simple, but it is not so simple as a function which accepts that same configuration structure directly as a parameter. This complexity becomes evident when testing, as a test for the simple function merely requires calling the simple function with the appropriate configuration value where testing the globally configured function requires manipulating the state of the entire application so that the shared mutable configuration state conforms to the tests requirements.

We can see that the log macro is actually quite complex becuase it reaches a significant amount of global state despite the fact that the log macro itself doesn't actually do that much. The log macro has to check in a global mutable registry of namespaces to logger implementations for a logger, check the whether that logger is enabled and then do the logging, potentially manufacturing a new logger using the factory from global state if it has to.

(defn mongo!
  "Creates a Mongo object and sets the default database.
      Does not support replica sets, and will be deprecated in future
      releases.  Please use 'make-connection' in combination with
      'with-mongo' or 'set-connection!' instead.
       Keyword arguments include:
       :host -> defaults to localhost
       :port -> defaults to 27017
       :db   -> defaults to nil (you'll have to set it anyway, might as well do it now.)"
  {:arglists '([:db ? :host "localhost" :port 27017])}
  [& {:keys [db host port]
      :or   {db nil host "localhost" port 27017}}]
  (set-connection! (make-connection db :host host :port port))
  true)

A function which changes state may be complex compared to a function which produces no effects and returns a value, but it introduces far more whole program complexity if global state is modified especially if it is modified many times.

This supports the intuition that perhaps sorting a list received as an argument in place doesn't add so much whole program complexity because the effect is restricted in scope to the caller and wherever the list to be modified was sourced from whereas reading and manipulating the ERRNO global value in a C program may directly impact any number of other program behaviors. Such modification defeats referential transparency, and forces us to use sequential logics at the level of the whole program to achieve understanding.

We use multiplication rather than addition to reflect that really what we're trying to approximate is the VOLUME of a multi-dimensional state space, which is measured by the product of the length of the sides, not their sum.

With apologies to Dijkstra, I note that it is quite possible for a program's complexity by this metric alone to grow at least in proportion to the square of the program's length. However there's also still plenty of room at the bottom for simple programs to achieve near-linear complexity growth.

4 Abstraction

We all know that the only mental tool by means of which a very finite piece of reasoning can cover a myriad cases is called “abstraction”; as a result the effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

– EWD, "The Humble Programmer"

What is an abstraction?

Abstractions can be considered to consist of a model, interface and an environment.

The model of an abstraction is the set of operational semantics which it exposes to a user. For instance a queue or channel as an abstraction provides the model that a user writes to it and a consumer reads from it. Reads may be in some order such as FIFO or LIFO, or un-ordered with respect to writes. A bounded queue may provide the additional semantics that, if the writer outpaces the reader it can only get so far ahead before the queue put operation blocks the writer. A Concurrent Sequantial Processes queue provides the operational semantics that when a write occurs the reader is awakened and so only the reader or the writer is ever operating at once.

The interface is the set of operations which the abstraction provides to the user. For instance the channel can have elements enqueued and dequeued. Perhaps the queue can also be closed at one end, denoting that either the source or the destination is finished whereupon the other end will eventually (or immediately) also close.

Abstractions don't exist in a vacuum. They have expectations which they may demand of the outside world (externalities) such as restrictions on how they are used or in what context they are employed. Everything that the model omits is ultimately an assumption about the environment.

It may be helpful to note that the environment is the dual of the model. Anything that is not part of the model must be part of the environment, and by growing the model we can shrink dependency on the environment.

This is deeply significant, in that mismatch between environments and reality results in friction for the user, and tends to be resolved not by abandoning the abstraction but by humans adapting to the restrictions imposed upon them by inadequate tools.

Abstractions may be compared - one said to be simpler than the other - on the basis of the size of the model, interface and expectations imposed on the environment. The model after all is a space of states, the interface a set of functions above, and externalities may be either a set of propositions or a space of states as the two are equivalent.

The ergonomics of abstractions may also be compared - one said to be more consistent than the other - on the basis of the consistency of the names and structure of the abstraction's interface with each-other and predictability of the relationship between the interface, the model and its externalities.

4.1 Examples of Abstraction

Per the Dijkstra quote above, abstraction is all about changing or choosing semantics. I'll give a treatment of linguistic abstraction shortly, but by naming things we can build semantic levels - languages - which convey what we want.

We can abstract over control flow by using higher order functions which provide for instance conditional execution giving us a tool with which we can talk about for instance conditionally applying a transformation.

We can abstract over values by choosing types or decomposing values into their components, using aggregate logical values and references which enable us to refer to parts of our data.

However it is not abstractive to perform computation. When we "compute" (reduce) an expression, we discard information irrecoverably. For instance an average is not meaningfully an abstraction over a time series. It is a datom, the product of some other irrecoverable data under a function. By computing this new value, we've given ourselves an aggregate view which no longer carries the same semantics and cannot meaningfully be considered to be equivalent to its source.

4.2 Limits of Abstractions

Unfortunately, abstractions do have limits.

Traditional types which you may be familiar with, your int and char, are really just raw presentations of what the physical machine can do with a given value. Rarely however do these presentations of machine capabilities map to the senses which are meaningful to us. They are the poorest possible of abstractions - none at all.

#include <stdio.h>
#include <stdint.h>

#define EVAL(...) printf("%s: %d\n", #__VA_ARGS__, __VA_ARGS__)

int main(int argc, char** argv) {
  EVAL((int32_t)(1<<31)-1);
  EVAL((int32_t)(~(1<<31)));
  EVAL((int32_t)(1<<31));
  return 0;
}
(int32_t)(1<<31)-1: 2147483647
(int32_t)(~(1<<31)): 2147483647
(int32_t)(1<<31): -2147483648

How many times in Java have you actually wanted precisely a value with 232 states of which 231-1 states are said to represent positive numbers, 231 states represent negative numbers and the zeroed bit string represents … zero. And arithmetic wraps around, flipping the sign should we excede the bounds of [-231, …, +231-1]. We also know that while we have a vector of bits … somewhere and one of them is a sign, and we can count on the 32nd bit being the sign we can't actually make endianness assumptions about the bytes in our integer(s). So much for bitmasking tricks.

This abstraction of an int type provides addition, subtraction, multiplication, division, absolute value and of course comparison. All of which we associate with the concept of an integer. Furthermore this abstraction supplies, by specifying model of the machine representation, all the operations we customarily associate with a vector of bits - and, or, xor, shifting, inversion.

If all we want is addition and comparison, the rest of this behavior is not just semantic baggage, it adds complexity. What if I want to count to 231? I have to be aware that this abstraction's model says I'll get -231+1, not the value I expect. Or perhaps an out of band signal that an IntegerOverflowException occurred, which constitutes another cross-cutting concern because now the abstraction implies a system of exceptions with which I must be familiar.

For a large domain of problems, this abstraction does well enough. However we must be constantly aware of its failure modes. We as professionals bear a constant cognitive cost in conforming ourselves to the limits which this abstraction presents to us.

It should be recognized here that Haskell, Lisp(s) and other languages which provide arbitrary precision integer types out of the box capture a simpler programming model by default. A model which has more implementation complexity to be sure and consequently expects more of the environment for instance the presence of a memory manager but the interface is smaller and less thorny.

4.3 Inheritance, Composition & Re-use

Software quality is defined first by solving the business needs of the here and now. If software or strategy or any other tool doesn't do that, it's worthless. Software is a form of physical capital in the same sense as a steam engine or a factory. A factory which can be re-tooled to produce articles of a different design is more valuable than a factory which will require full replacement should the product change. Likewise an engine or assembly line which must be significantly reworked in order to add capacity or change task is less valuable than a design which naturally features a pattern along which it can be changed without significant alteration.

Object Oriented Programming (that is, programming via encapsulation and inheritance) promised that it would improve among other things software reusability when it broke into the scene.

The problem with inheritance as a strategy in practice is that any tool which wants to participate in some abstraction is forced to conform to the expectations of the interface established by the parent which the child wishes to participate in. The interface must accumulate.

Furthermore, reuse through inheritance has the problem of capturing mutable state. An extension or wrapper class must be intimately familiar with the state of the class it wraps and the expectations it may make. The model accumulates.

In "Simple Made Easy" Rich gave a good treatment of this, pointing out that code partitioning doesn't imply decoupling and pointing to Interface Oriented Programming is the response to this problem, because interfaces only bring with them their direct set of expectations.

To be precise about what's happening here - the model of inheritence oriented programming forces us to accreet interfaces and externalities. Thankfully this is not an essential property of abstraction, merely a property of this particular technique.

(defn mapcat [f & colls]
  [f & colls]
  (apply concat (apply map f colls)))

When we take two abstractions and put them together, we say that we've composed them. For instance two functions compose together to make a third. The map function composes with the concatenate function to give us mapcat. map lets us apply a function over many values, and concat allows us to join sequences together. We can thus define mapcat to be a function which lets us join the results of applying a function which produces sequences to many values. The composite model of mapcat requires only the additional constraint that f: a → [b], that is the function to be mapped produces results which can be concatenated.

(def evens #(keep even? %1))

We can build abstractions which have smaller interfaces. For instance if we take a function of a configuration and a datom and we partially apply it with a configuration, we now have a fixed function of a datom. Its model is smaller - we know how it will behave because we've specified whatever configuration(s) the larger base model depends on and the interface is smaller - it consists only of the datom to manipulate.

Building a small wrapper around a large Java API would be a good example of such a composite abstraction with a smaller interface.

We can also build composite abstractions which traverse the model and externality trade-off. For instance we can build abstractions which simplify the set of externalities by providing input or state validation where previously we had unchecked expecatations. This expands the model since now the expectations are explicitly modeled and checked. We can also reduce the complexity of the model by choosing to make assumptions.

4.4 Naming

Credit: Zachary Tellman, Elements of Clojure (unreleased).

Linguistic abstraction in the form of naming is the most fundamental tool available to us as both artisans and engineers.

Names being the tools we use to understand our tools, choosing good names (whatever that means) is of primary importance. But what characterizes a "good" name?

Frege suggests that a name consists of three properties - the sign, the textual representation of the name, the referent being the entity referred to by the sign, and finally the sense, being the properties ascribed to the referent.

The traditional example is that Phosphorous (morning star) and Hesperus (evening star) are both Greek celestial bodies which we now understand to both name the planet Venus. The sign and referent are according to Frege insufficient to understand these two terms which are prima facie synonyms because they don't actually interchange.

A good name must satisfy two properties - it must be narrow and consistent.

A narrow name excludes things it cannot represent; its sense is only as broad as the context of its use requires. For instance if the only expectation to be made of a reference is that it will be a mapping, then map or its contraction m is a perfectly acceptable name if that is the only sense in which the referent is used.

A consistent name shares the same sense for a referent as is used for that referent in other related contexts. To continue the example of a mapping, to subsequently sign the same referent as dict elsewhere would be less than ideal both because it's a broader name which implies a specific kind of mapping and thus no longer shares the same sense. A simple mapping m may only imply clojure.lang.ILookup, whereas hashmap implies java.util.HashMap and side-effectful update. Naming and naming conventions are tremendously important in the context of dynamically checked systems wherein names largely depend on their sense to communicate the type or at least intent of the use of the referent.

Lets consider some examples (adapted from Zach's fine book)

;; This is a Very Bad expression.
;;
;; The left referent is very broad, when we clearly use it in a narrow sense. It
;; should be more narrowly named.
;;
;; The value we get out of our broad referent is named very narrowly. The single
;; string contains far too much sense.
(get m "sol-jupiter-callisto")

;; This is a better equivalent expression, because we communicate a more
;; appropriate sense for our previously over-broad sign, and we unpacked the
;; previously overly precise key into a set of more general structures each of
;; which communicates less sense.
(get-in starsystem ["jupiter" "callisto"])

;; Better yet capture the logical operation, the fact that Callisto is a moon of
;; Jupiter is an assumption in all of the above expressions.
(get-body-by-name system "callisto")

;; It would also be better to be completely precise about the sense of the name
;; Callisto by doing something like
(-> (get-system "sol")     ; This communicates that we want to get a "system" named sol
    (get-body "callisto")) ; And we care about the component "body" named callisto

;; This expression further communicates the sense that Callisto is a moon of Jupiter.
(-> (get-system "sol")
    (get-body "jupiter")
    (get-moon "callisto"))

;; Both of these two latter expressions are better because more of the relevant
;; sense of the name callisto is captured explicitly, rather than being implied
;; upon or expected of the seemingly broad mapping referent.

4.5 Achieving Abstraction

As suggested in the discussion of the consistency of signs, part of the problem here is that we're forced by the tools we use to project our sense directly upon the sign.

Types: A heresy static sense dynamic sense
static dispatch Haskell, C, Java Java (class casting)
dynamic dispatch C (dyn linking) Smalltalk, Python, Ruby

Static type systems provide a model with which we can talk about the sense we associate to a name by capturing at least some of the sense in which we intend to use the referent as a type and knowing that if we can determine a type (and the type must conform to a type of which we knew ahead of time) then we can begin to understand the behavior of our systems through the lense of the types we've assigned to our data.

Some systems allow us to use this information statically to check that at least some part of the sense or type is consistently respected in the use of a referent.

Some systems enable you to capture more sense than others, by encoding the sense statically in what we usually call a type.

Type systems can be compared in terms of how much sense they enable you to capture, and how difficult it is to do so.

Lets say that we wanted to count up from zero, to continue the example of positive integers from above. We could choose a sign with which to reference our value which indicates that it's an index such as the traditional i, or idx or even index depending on individual taste. But sometimes we want to play clever tricks with the fact that sometimes negative indexes are defined, or we want to be tricky and reach back to some location "before" in memory where we're currently indexed to and soforth. The choice of sign implies only sense, it cannot constrain the referent.

However, we can choose abstractions which let us do so. For instance, we could build an abstraction which only allows us to capture ℕ, the natural or counting numbers.

from collections import namedtuple
from functools import total_ordering

@total_ordering
class N(namedtuple("N", ["value"])):
  """A numeric value constricted to N (the counting numbers)."""

  def __new__(cls, value):
    if value < 0 or isinstance(value, float):
      raise ValueError("N is constrained to [0, 1, ...)")
    return super(N, cls).__new__(cls, value)

  def __int__(self):
    return self.value

  def __add__(self, other):
    return N(int(self) + int(other))

  def __sub__(self, other):
    return N(int(self) - int(other))

  def __eq__(self, other):
    return int(self) == int(other)

  def __lt__(self, other):
    return int(self) < int(other)

This isn't so different from the full Haskell implementation of Numeric.Positive.

There are techniques which can be used to try and capture more sense. This technique is best known as "smart constructors". Rather restrict the sense to the sign, we've defined a type which can only take on values conforming to our desired sense.

The additional semantic restrictions we can impose on ourselves from the type free us from either having to check the sense ourselves over and over again which is the most verbose and fragile path, or from settling for the integer sense we get "for free" and hoping that it's enough.

This tactic can be applied to any domain where we want to give a type to a subset of some other space of values. For instance strings of a known format. It is just window dressing, but frequently it's enough that we can approximate the semantics we actually want.

To take another example, what if we were to try and make a type for a Start of Authority (SOA) version? We could use a simple string, after all that's what the SOA becomes when we lay it out in a zone file, and the only requirements we have of an SOA is that it increase to define an ordering among zone file versions.

Generally however, strings (even constrianed strings!) should be avoided. Text is the least structured form of data. Parsing is, even with the aid of parser combinators and regular expressions, difficult and error prone. It's far better to use fully realized data representations which can be rendered to strings than to keep data as a string and decode it.

Which carries more semantic meaning - the string "20170706001", or the tuple

from collections import namedtuple
from scanf import scanf  # 3rdparty

class SOA(namedtuple("SOA", ["year", "month", "day", "version"])):
  """A tuple capturing a DNS SOA format."""

  PATTERN = "%04d%02d%02d%03d"

  def __new__(cls, text):
    """Decode a SOA string, returning a SOA tuple"""
    return super(SOA, cls).__new__(cls, *scanf(text, self.PATTERN))

  def __str__(self):
    """Encode this SOA as a string"""
    return self.PATTERN % self  # self is an appropriate 4-tuple already

print SOA("20170706001")
# SOA(year=2017, month=7, day=6, version=1)

The tuple can be rendered to and parsed from text as required, and above all it captures the sense in which we use SOAs which leaving the SOA as a plain string does not. If strings are just passed through and not manipulated, then perhaps it's acceptable to keep them represented as strings and skip decoding but structures like this which capture more of the sense of our data should be preferred.

Unfortunately there's often lots of other sense we may care about - for instance that we have a number within the index bounds of some structure or that there exists a record in the database with an id primary key of this value for any possible value. Checking these senses statically may well be impossible. There could be a concurrent change to the database and a once valid identifier becomes dangling and soforth.

Static types in the Haskell sense of things help, they allow you to capture some of the relevant sense - enough in my experience that software development may be easier because the type system helps you especially when you need to change abstractions but thanks to the completeness theorem we know all too well that they can never check every property we may care about.

4.6 Utility Is Contextual

platos-cave.jpg

It's tempting to think that we can, through great skill or shear luck, discover perfect abstractions; Platonic ideal or Martian crystalline forms of a concept which satisfy all our needs for all time. The problem with this view is that different contexts have different needs. Many objects may generate the same shadow on the wall of Plato's cave.

For instance above I attacked the "stock" numeric abstractions which include hardware implementation details, suggesting that arbitrary precision math is a better default. I stand by that argument, but it must be recognized that there are occasions when we must make different tradeoffs in choosing our models.

Sometimes we need to interface with the machine. Sometimes we want to use (or author) bit vectors. Sometimes we don't have a memory manager and can't afford truely arbitrary precision arithmetic. Sometimes the performance degredation of operating on a double floating point number as opposed to a single or half width floating point number is meaningful and we must make a choice to increase complexity and sacrifice ergonomics in the name of utility. That some machine type is the shared interface of some other abstraction and that the externality of the machine's particular wall time performance relate to our context changes what is appropriate.

Just as we should exercise caution in choosing our names so that we don't choose overly narrow or broad names, so we must exercise caution in choosing our abstractions. The simplest possible abstraction with the fewest possible externalities may well not be the most appropriate to the domain, and worse still a simple abstraction is not likely to lend itself to change. Like other crystals, abstractions shatter or shear rather than bend from their natural shape.

Coupling of abstractions - allowing the tools you choose to share externalities and to expect one-another is dangerous because it means that even small deviation from the original requirements and context may prove cause for significant change with rippling consequences for your entire system.

5 Performance

The greatest performance improvement of all is when a system goes from not-working to working.

– Osterhaut "My Favorite Sayings"

Performance and efficiency generally are factors in the measurement of how well a tool solves or responds to the needs of the "user", whether they be a person, business or other entity.

A code breaking machine which can decode a daily message in 32 hours isn't sufficient. It can't decode messages fast enough for them to be relevant. Such a machine is not useless, but it doesn't solve the problem set for it adequately. Three such machines together will provide, at an 32 hour delay each, the previous day's message and perhaps a fourth or even pairs of machines may be supplied for reliability in order to provide a more-or less continuous feed of intelligence but it would still be yesterday's data today and delivered at great cost.

In this example we have an objective criterion of how much performance is "enough". The cracking machine must have a cracking rate of at least 1 msg per 24 hrs, and we have an objective measurement that we're only able able to crack at the rate of 1 msg per 32 hrs.

There are two primary approaches to achieving performance where it is required. The first approach is to simply do less work - either by reducing the problem using algorithmic optimization, or by reducing the size of inputs by firing customers or negotiating system scope. When optimization must occur, algorithmic or organizational optimization should be by far preferred in the interest of overall program simplicity with all its other virtues.

The second approach is to tune what work must be done to better suit the machinery we use to do it. This is what we usually mean when optimization comes up, although in general we should prefer to find ways to simplify and reduce the problem.

Two opinions about programming date from those days. … The one opinion was that a really competent programmer should be puzzle-minded and very fond of clever tricks; the other opinion was that programming was nothing more than optimizing the efficiency of the computational process, in one direction or the other.

– EWD, "The Humble Programmer"

To be clever means to be "sharp", "quick" and "skillful". The old English roots suggest a sense of "cutting" and "keenness". It would be cleverness to design a tool which does the difficult or seemingly impossible within resource constraints by leveraging intimate knowledge of the system.

This formulation of a "clever trick" implies that a really clever trick must also have a high complexity. This is the root of the trouble with optimizations of the second kind. Clever tricks while perhaps relevant in the moment are in the long term likely to prove troublesome.

Worse, clever tricks of this kind require specializing some part or all of your program so that it reflects performance considerations. This has the particular negative consequence of meaning that your code cannot be so simple and algorithmic anymore - it must be reflective of some state(s) or trick(s) used to achieve acceptable performance because performance is now a crosscutting concern which must be accounted for by your abstractions.

Frequently, reasoning from first principles is not enough to produce fully optimized code. Modern high performance sorting algorithms are absolutely fascinating, because while the algorithmic complexity may suggest that provably optimal algorithms such as mergesort or quicksort should be universally adopted it turns out there are frequently ways to cheat and do better. Modern collections libraries use adaptive sorting strategies which benefit from existing sortedness in the input collection and frequently contain multiple different sorting implementations which are applied to different sizes of collection.

Even apparently simple statements like "doing two string concatenations will be slower than doing one" are impossible to evaluate without a real effort at profiling. The JVM among other platforms recognize that string manipulation is "expensive" due to the need to resize and copy buffers. However, both the Oracle JDK and OpenJDK actually include several optimizations which detect code doing trivial string concatenations and silently replace it with equivalent code which uses the StringBuilder class which delays buffer resizing as long as possible.

If you want fast, start with comprehensible. You can't make it fast if you can't change it.

– Paul Phillips, "We're Doing It All Wrong" (2013)

Unless there are known constraints, such as here the hard throughput requirement of a message a day question of "is it fast enough?" and the statements "this isn't performant" or "we need to do this for performance" are totally meaningless. Worse, they detract from the goal which should be to ship a tool which serves a purpose and introduce artificial pressure to conform the choice of abstractions to arbitrary false goals.

6 Fin: Purism

Niklaus Wirth's work is notable superficially because Wirth is himself a prodigious hacker. He's responsible for, among other projects, no fewer than five separate programming languages and two operating systems. What's the trick? How does he do it?

In the implementation of the Modula-2 compiler, Wirth and his students used two metrics to estimate and track its quality over the course of its evolution. One was the simple line count of the program - a poor proxy for complexity but a data point none the less. Another was the time which it took the compiler to compile itself.

One of Wirth's students implemented an elegant hash table backed by binary trees, for use in mapping strings to symbols - to look up local variables and implement lexical scope. Doing so introduced a new, complex datastructure, increased the line count of the compiler and added at best no performance benefit for it turned out most scopes in real programs were small for which a simple association list could be faster. Wirth ripped the hash table out.

Wirth was also a great user of tools, such as Backus-Naur Form and parser generators for expressing precisely the meaning which he wished to ascribe to a given set of symbols and the notation which his languages should represent. Even in these, Wirth's ascetic attachment to simplicity is clear - all of his languages are simple and can be efficiently implemented without any clever tricks whatsoever.

Simplicity and quality aren't properties which code or systems get by osmosis or by grant of the gods. They are the product of deliberate effort by programmers to resist the entropic pressure applied by reality and customer requirements on systems and to in spite of that pressure carve out cohesive, appropriate abstractions which leave room for growth.

It would be equally absurd, then, to expect that in unjust, cowardly, and voluptuous action there should be a mean, an excess, and a deficiency; for at that rate there would be a mean of excess and of deficiency, an excess of excess, and a deficiency of deficiency. But as there is no excess and deficiency of temperance and courage because what is intermediate is in a sense an extreme, so too of the actions we have mentioned there is no mean nor any excess and deficiency, but however they are done they are wrong; for in general there is neither a mean of excess and deficiency, nor excess and deficiency of a mean.

– Aristotle, "Nichomachean Ethics", Book II, Ch. 6

In the "Nichomachean Ethics", Aristotle (or rather his student Plato writing in his master's name) develops a system of ethics on the basis that one must always choose the mean between excess and deficiency.

There is a fashion in the software community to be "pragmatic". That is, to be tempered or practical with respect to an attitude or policy. Unfortunately, if you temper temperance with an added dose of reality, you no longer have "pragmatism", you have a regression. Software as an industry is built on an incrementalism of "progress" which is informed by experience but largely unaware and often dismissive of the possibility space.

Before we can claim to be pragmatic, we must have a vision of the perfection from which we turn aside in practical interests. It is my opinion that the overwhelming majority of working programmers are not conscious of what could be; of how higher order tools could help them; of how the pains they feel are the fault of specific failures of their tools and are not essential.

My goal with this talk was to share with you some of the philosophy and purism which I carry with me and which I hope helps me to strive for more perfect solutions. I hope that it helps you too.

Thanks.

Techwriting 101

At Twitter we have regular employee training courses on a number of subjects. These are my notes from our class on writing technical documentation for developers.


What are docs

Anything from diagrams to stained bar napkins. Could be a book, could be a public website, could be an internal resource.

Docs are the way that experts re-visit subject domains. They're how your customers / other service owners read in on your system. They're how new hires will start orienting themselves to the Twitter environment.

Why docs

Google study: 1/3rd of all code searches are looking for examples or specific API information.

Django: 120Kl of English, 80KloC. If you haven't written your docs you haven't written your product.

Outline

Gonna go over three main sections...

  1. The secret to writing good docs (know thy audience)
  2. Writer's block
  3. Polishing (patterns & antipatterns)

1. Know Thy Audience

Who is the document you are preparing for? Other developers? Non-technical user? Yourself in six months? An SRE at 2 a.m. when things are on fire? People with no context (far distant team or new hires) and remotes who need to be able to find information without constantly standing on someone.

Common kinds of documents & audiences (think job to be done)

  1. Introductory material
  2. Runbooks
  3. Service docs
  4. Design docs
  5. Cookbooks
  6. Team pages
  7. Troubleshooting
  8. Style guides
  9. Policy pages
  10. Roadmaps

What do your readers know comming in?

Different sets of documentation serve different purposes. What knowledge do they have and what can you assume?

Most developer docs err on the side of assuming knowledge. Lots of possible reasons for this, but it's an antipattern to be aware of.

What's the takeaway your document should communicate?

Doesn't have to be 5pgf MLA format, but you should have a TL;DR or thesis. This helps avoid problems like the interminable FAQ.

Audience Creep

Know the scope of the doc. Often you'll incrementally expand the vision of the document until it tries to be everything to all people at which point it's kinda useless.

2. Writer's Block

JUST. FRIGGIN. START. WRITING.

A metaphor: activation energy. Anything you can do to lower the activation energy encourages the reaction to occur. Once you get into a project you're "in the flow" and it's easy to keep hacking/writing, and you want to do whatever you can to stay there.

Forcing functions: pomodoro / the most dangerous writing app etc. Lifehacks and soforth abound here.

Often the way to get started is to just braindump.

2½. The documentation lifecycle

  1. Rapid development - visual artists & cartoonists. You start with shapes and just start doodling, iteratively refining. Someone dumped a wall of text, someone else organized it, someone else added examples... etc.

  2. Feedback - People read it and comment. Someone else goes on-call and has comments / discovers things which aren't written down. Even comming back to it the next day helps, but really it's about getting someone with a new context someone else who doesn't have the same context to look at the code. This is why docbird has code review as part of the workflow, it ensures that someone else has to put eyes on the document. You've got to build this into your team's workflow.

  3. Maintenance - Documentation which isn't true to the live system is pretty much worthless.

3. Writing really super amazing docs!

Antipattern: docs that aren't docs

It's not documentation to tell someone to ask someone else. Guideposts of where to do discovery are helpful but they aren't sufficient.

Antipattern: nerdview

Defined by the Language Log crowd

Nerdview is the tendency for people with any kind of technical or domain knowledge to get hopelessly stuck in their own frame of reference

Imagine reserving a rental car inline, and being asked to enter a date in DD/MM/YYYY format and getting an error message "Please pick a date greater than today". Nobody says "greater than today". This sentence only makes sense when you're thinking about dates as integral values which you're gonna be comparing as in a computer system.

This is really just an instance of knowing your audience.

Antipattern: noun piles "crash blossoms" or "garden path sentences"

Just adding hyphenation to clarify qualification can help a lot.

Antipattern: Acandemese/technicalese

Don't slip into formalism just because you're writing docs.

Antipattern: slang

Non-english-first speakers exist. We employ lots of 'em. Know thy audience and try not to over-use slang if only for the purpose of clarity.

Antipattern: Buzzwords

Type-safe! Modular! Parameterized!

Sometimes these help, sometimes these just add volume. Be aware.

Antipattern: Acronymns

Define them! Did you know that we have TSA and ACID at Twitter neither of which mean what you think they mean in common outisde-twitter usage?

Antipattern: Birds

DON'T FUCKING NAME YOUR THING AFTER A BIRD. STOP IT. THE JOKE IS DEAD. IT'S NAILED TO ITS PERCH. IT'S SLID DOWN THE CURTAIN AND JOINED THE CHOIR INVISIBLE. IT IS AN EX-PARROT.

Pattern: Simplicity all the way down

Simple pages

10-20min ABSOLUTE LIMIT on a given page of docs. Some say 5-10min is better... it depends on what your use case is and how you expect your audience to read. Again think what is the job of the document and what kind of attention do you expect your reader to direct to it?

Simple sections

Sections should be single-topic and targeted.

"Tasks not tools". Title your section in terms of what problem your reader is likely trying to solve, becuase only rarely is your audience reading to lear rather than reading trying to solve a problem. Can you title your question as a question or a gerrund rather than as a spare noun?

Remember that while you may write your docs front-to-back people will tend to discover them via some search tooling, so you shouldn't be afraid of repetition and certainly not of linking back to things you've already written about.

Simple paragraphs

Don't be afraid of single sentence paragraphs.

Don't present information that's better rendered tabularly or as a diagram as prose just because that's what's convenient / easy to write from pick your editor of choice.

Simple Sentences

"Someone said that words are a lot of inflated money - the more of them you use the less each one is worth"

~ Malcolm Forbes

Simple Words

The beginning of knowledge is to call something by its proper name

~ Confucius

Coherence vs. Completeness

Your doc should hang together. You should be able to sum up topics. This fits naturally into the context of completeness.

Kinda naturally at odds with Coherence. Completeness is saying everything, and coherence is saying just enough or just what is needed. Same as #shipit vs #berigorous. What are the top three use cases for this doc? What are the next three? Can you hide complexity with links?

You can kinda only keep five things in your working memory at once… three is kinda this nice magical number. Linking is a great way to reference complexity without becomming massively monolithic.

Templates & metadata

Often times you'll open a document and kinda have no idea who the author was and what the status was and what the changelog was and so on and soforth.

By using templates which provide those empty headings/sections/fields it becomes easier to get it right and write all that down.

Templates also help you be consistent in the structure of your documentation, which means that users can predict and rely on the structure of your content which makes docs more approachable.

Correctness, Maintenance & Culture of Documentation

  • Docs have to be treated as a first class priority & are part of the product.
  • Products aren't shipped unless you have docs &c.
  • Teams recognize that docs take time and allocate time for building documentation.

Formatting & Variation

In lots of docs you have this "wall of text" antipattern. This is particularly bad in confluence because the editor is weak and lines are long.

Formatting helps make the high-level structure of the document easier to understand / visualize / work with.

Lists, boldface, italics and break out boxes help display structure tastefully. BUT DON'T OVERDO IT. If everything has a warning box, your users will ignore all of them and soforth.

asside: Overdoing it. If you can't simplify the techdocs, it's commonly a smoking gun that the thing you're documenting is too complex.

Edward Tuft's The Visual Display of Quantitative Data.

Pie charts are the worst. Bar charts and tables are better almost universally.

Closing bits

  • Examples! Write them! They should be as the user will use them!
  • Interleave code and prose! Interactive examples are awesome!
  • Permalinks! Twitter has its own nice internal permalinks system, but these can make it a lot easier to find and reference the document you want.
  • Use a prose linter! (http://hemmingwayapp.com etc.)
  • Teach a class / make a video!

Chowderheads [2004]

Volker Armin Hemmann wrote:

if we are talking about people who are only able to install gentoo because of an automated, graphical installer, then we will get: a lot more 'bug reports' that are not ones. a lot more really really stupid questions (I wait for the day, someone asks, where he can find the gentoo-homepage), and no new developers (but a lot more work for the existing ones).

One might also imagine we'd get less questions on actually installing Gentoo and more on doing stuff with it. There I go again with my tricky logic.

All successful and useful projects get new people. It's a fact of life and frankly if you aren't, you're doing something wrong. That holds true from the Gentoo project to the Roman Empire. If you can not integrate new people successfully into your organization, it will fail.

Gentoo has in fact from the very start been about making things easier. Easier to pick the packages you want, easier to upgrade, easier to custom build packages, easier to update etc files, etc. Gentoo has even gone out of its way to make better how-tos and is known in the Linux community at large for having just about the most useful and friendly forums.

Gentoo can either continue extending the infrastructure to support the people being attracted to a damn useful distro. Or clowns like you can attempt to keep Gentoo all to yourself with Jim Crow style exclusionary tactics.

kashani

Via marc.info