Architecture and Microarchitecture

From a recent paper review I wrote -

So @ztellman is writing a book The Elements of Clojure which would better be titled The Elements of Software. It’s almost entirely about exploring ideas about abstraction and how abstraction and engineering interact. Maybe worth perusing since this paper seems to hit a bunch of related ideas. To crib some of those:

Science is about dealing with timeless things. The lambda calculus has no engineering constraints. It’s a great substrate for proof, and takes some head standing to become an adequate substrate for doing “real” work because doing “real” work is characterized by trying to fit within a finite machine and finite time.

Engineering is about shoveling the constants around to optimize for how many transistors we can fab at what density and how many we can drive / cool at what power without melting the whole thing. That is, it’s entirely an exercise in factors which are far from timeless and thus not suitable for long lasting abstractions. Architectures like the Itanium, MIPS and TTAs you allude to are the final form of this kind of thinking. It may be momentarily convenient say for performance to expose, as in the MIPS, a pipeline without interlocks or the full physical register set, but as the paper conveys well this constitutes a fragile API which will prove limiting in the future when the engineering constraints which motivated those decisions change.

The bit you don’t seem to hit on yet is what is architecture, and how do we do it? I suggest, still cribbing from Zach, that architecture both in software and in hardware is all about trying to find the perfect layer of indirection between the engineering constraints of the preset (time, memory, chip area, heat, etc.) and the universal properties or statements which can constitute a stable target. That is, you’re trying to find what set of cost trade-offs you can presume are constant-ish for as long as you hope your design will be relevant. Will we still be monocore and care about single cache density? Will the NUMA gap continue to increase? Will we continue to be able to rely on speculation for performance wins, or will we switch to more communication based models of computation where privilage level context switches must become more expensive and communication must be efficient?

To cherry-pick your CAR / CDR example of an architecture feature which has lasted far too long; While the names are absurd and now long irrelevant the notion of the first/next pair has proven to be general, timeless representation for arbitrary graph structures. Which has been repeated time and time again silly names included even in the face of memory architecture [1] changes which render the original conception less and less appropriate.

Straight stack, thunk or SECD VMs follow hot on the heels of really trying not to assume anything. Maybe that’s fine. Maybe as you hint at the end, an abstract register machine with some support for generic operations in the style of the Cray-1 which may be specialized or implemented with accelerators later is appropriate.

The central idea that this paper seems to be toeing is that as implementers with our eyes on what we’ve built our natural tendency when “specifying” is to write down the specific thing we’ve envisioned. From a cognitive laziness perspective it’s the natural thing to do. Taking a step back and realizing what’s essential versus what’s incidental is really goddamn difficult because their (our) job is to have our heads so far down in the microarchitecture that we’re counting clock cycles and cache lines in our heads before we even have a simulator. This sort of clock cycle counting - engineering and the failure of architecture - is what leads to the microarchitecture-as-architecture trap sketched above.

Dunno how much of this is useful without turning it into a philosophy paper, but it does seem like there’s plenty of room to hit on these issues a little more directly and be concrete about the possible advantages of abstract structures like the admittedly absurd x86 floating point stack unit.