if we are talking about people who are only able to install gentoo because of
an automated, graphical installer, then we will get: a lot more 'bug reports'
that are not ones. a lot more really really stupid questions (I wait for the
day, someone asks, where he can find the gentoo-homepage), and no new
developers (but a lot more work for the existing ones).
One might also imagine we'd get less questions on actually installing Gentoo and more on doing stuff with it.
There I go again with my tricky logic.
All successful and useful projects get new people.
It's a fact of life and frankly if you aren't, you're doing something wrong.
That holds true from the Gentoo project to the Roman Empire.
If you can not integrate new people successfully into your organization, it will fail.
Gentoo has in fact from the very start been about making things easier.
Easier to pick the packages you want, easier to upgrade, easier to custom build packages, easier to update etc files, etc.
Gentoo has even gone out of its way to make better how-tos and is known in the Linux community at large for having just about the most useful and friendly forums.
Gentoo can either continue extending the infrastructure to support the people being attracted to a damn useful distro.
Or clowns like you can attempt to keep Gentoo all to yourself with Jim Crow style exclusionary tactics.
At work, I'm partially responsible for managing a large puppet deployment.
Puppet is largely configured in terms of providers packaging resources & templates which are used to lay down configuration files on disk to manage host behavior.
It is after all a configuration mangement tool.
And it works great for the problem domain of "I have lots of servers with lots of state to configure".
But I have dotfiles.
The way that I manage dotfiles needs to capture the idea that I want to merge many different components together into the state of the given machine.
Sounds like puppet...
But my dotfiles have to be self-bootstrapping, and they can't presume access to any particular master node.
These two requirements are actually pretty firm, because whenever I bring up a new laptop or a new VPS it's basically clone down my dotfiles, run an install script in the repo and done.
GNU Stow has served me really well so far.
It totally solved my initial dotfiles problem, which was "I want to just symlink a bunch of stuff out of a repo".
When I wanted to start organizing the repo, stow became less ideal and I wound up with a wrapper script around stow which had some notion of sets of repository delivered stow packages to install on a given host.
And that worked for quite a while on my personal machines.
Then I had a problem at work.
I use ~/.zsh/conf/* as a holding pen for a bunch of configuration files regarding variables, functions and general shell bringup.
When I was only configuring my personal machine(s), stowk worked fine because I had the same settings everywhere.
Stow installs a whole zsh package from the repo, which includes a .zsh and a .zsh/conf and you're all set.
The problem came when I realized that some of my configurations only apply to my work machine.
Because I use a Mac at work, I have some extra stuff I need to do with the path and homebrew to get things...
quite right and I don't want to burden my Linux install(s) with all that.
A monolithic zsh package won't cut it anymore.
But Stow only deals with packages consisting of whole directories!
It won't let you manage single files!
Really what I want is not a stow which lays down symlink directories, but a tool which makes real directories and symlinks in files.
That way I could write several packages, say zsh and work_zsh which would merge together into the ~/.zsh tree the way I want.
So lets start laying out files on disk.
Okay so that's not hard.
Just requires ln, makes appropriate use of string quoting and has support for a debugging mode.
We're also gonna need a thing to lay out directories on disk.
Okay so this function will accept an un-normalized path (think a/./b/c), normalize it (a/b/c) and ensure it exists.
Because we're interacting with a file/directory tree which Stow used to own, it's entirely possible that there used to be a symlink where we want to put a directory.
So we have to support a force mode wherein we'll blow that away.
so now let's write a stow alternative that does what we want.
So if we just want to install a single file, we normalize the file and compute the path we want to install the file to.
Now it's possible since this is a symlink based configuration management system that the target file already exists.
The existing file could be a deal (old) symlink, or it could be a link we already placed on a previous run.
We can use realpath to resolve symlink files to their paths, and determine whether we have a link that already does what we wanted to do.
In the case of an existing file and $FORCE we'll clobber, otherwise we'll only install a new link if there isn't something there.
Great so this deals with installing files, assuming that we want to map from ./foo to ~/foo.
Now we can really write our stow, which will eat an entire directory full of files & subdirectories and emplace them all.
It turns out that the ONLY really bash safe way to support filenames and directory names containing arbitrary whitespace or other characters is to use find -exec bash, rather than parsing the output of find.
If you try to parse find's output, you wind up having to designate some character as special and the delimeter.
I thought that whitespace was a safe assumption and found out I was wrong, so wound up taking this approach of using the -exec option to construct recursive bash processes calling my exported functionss.
Which is why I've been exporting everything all along.
So given ~ as the install target, and ./foo/ as the package to install, we'll cd into foo in a subshell (so we don't leap CWD state), find & arrdem_installd all the required directories.
THen it will arrdem_stowf all the files, with a couple exceptions.
README fiiles, BUILD and INSTALL files are exempted from this process so we don't litter ~ with a bunch of files which aren't logically a part of most packages.
Which will get the job done, but still leaves me with the problem of picking and choosing which packages to install on a given host.
I can solve this problem by going full puppet, and defining a concept of a profile, which consists of requirements of other profiles or packages.
When I ./install.sh on a host, it's gonna try to install the profile named $(hostname) first, falling back to some default profile if I haven't built one out for the host yet.
where requirements files will be special and let a profile list out the other profile(s) and package(s) which it should be installed with.
This lets me say for instance that the profiles.d/work profile is defined to be profiles.d/home more the package work-zsh for instance.
Or that rather, profiles.d/work is profiles.d/default more a bunch of stuff and profiles.d/home is entirely independent and includes configuration(s) like my Xmonad setup which are irrellevant to a Mac.
So first we need to be able to install something we consider to be a package
Packages are directories which may contain the magical files README, BUILD and INSTALL.
If there's a BUILD file, execute it before we try to install the package.
This gives packages the opportunity to do host-specific setup, such as compiling fortune files with ctags.
The INSTALL script gives packages an escape hatch out of the default package instalation behavior, say installing OS packages rather than emplacing resources from this directory.
Otherwise, we just tread the directory as a normal stow package and install it.
Okay, so now we need to support profiles.
So if there is a directory with the given profile name, then if there is a requires file, pattern match profiles & packages out of the requires file & install them.
For good measure, install any packages which may be included in the profile's directory.
Now we just need a main to drive all this.
At this point we've built out a shell script which depends only on find, bash and realpath but can support some really complex behavior in terms of laying down user config files.
As hinted above, this could install OS packages (or homebrew).
By making heavy use of foo.d directories, it becomes super easy to modularize configurations into lots of profiles & merge them together for emplacement.
Best of all in debug mode it becomes pretty easy to sort out what's comming from where with a grep, or you can just stat the emplaced symlin(s) which will give you a fully qualified path back to the resouces they alias.
Not bad for a one-day garbage puppet implementation.
The code for this monstrosity is available here as a gist , but comes with a disclaimer that it's a snapshot of the working state of my dotfiles repository as of this article's writing and may be suitable for no purpose including my own usage.
So Daniel Compton (@danielwithmusic) is a good bloke.
We've been co-conspirators on a number of projects at this point, and I just wanted to share a quick vignette before I pack it in for the night.
Almost a year ago, James Brennan (@jpb) was kind enough to offer up a pull request (#158) to the kibit tool which Daniel and I currently maintain.
We're both relatively inactive maintainers all things told.
Kibit largely does what it's supposed to do and is widely used for which neither of us can take much credit.
We're stewards of an already successful tool.
Between Daniel's day job and my move out to San Francisco #158 just got lost in the shuffle.
It's an awesome feature.
It enables kibit to, using the excellent rewrite-clj library, automatically refactor your code for style.
If kibit can find a "preferred" replacement expression, thanks to James's work #158 enabled kibit to make the replacement for you.
While Daniel and I kinda just watched James pushed it to feature completeness and found a significant performance win which made it not just a compelling feature but fast enough that you'd want to use it.
Then a couple months passed.
Daniel and I had other things to do and presumably so did James.
At this point James' patch adding exactly this feature had sat idle for many months.
Some other things had come in, been more active and been merged.
James' changeset now had conflicts.
Following the github help docs for how to check out a pull request (spoiler:
git fetch $UPSTREAM_REPO pull/ID/head:$NEW_LOCAL_BRANCHNAME) I had James' patches on my laptop in less than a minute.
git merge immediately showed that there were two sources of conflict - the kibit driver namespace had had its namespace refactored for style and a docstring had been added to the main driver function which James' patches touched.
The other was that dependencies had been bumped in the project.clj.
Fixing this took....
The test suite was clean and in 11 minutes Daniel merged my trivial patch to James' awesome work done and live.
The whole process of about 10 months was overwhelmingly waiting.
James finished the patch in like four days (April 20 '16 - April 26 '16).
Daniel and I were just bad maintainers at getting it shipped.
Were Daniel and I worse maintainers, we could have seen #177 come in and asked either Cody or James to update the patch.
It would have taken maybe five minutes tops to write that mail and maybe it would have saved me 15 minutes and Daniel 5.
After months of waiting? Why?
I've written before about my own thoughts on code review after working in an organization which is high trust, high ownership and sometimes it feels high process anyway.
In this case and I'm sorry to say almost a year late, I went by what I've come to believe - that reviewers should make an effort to take responsibility for merging code rather than requiring the primary author to do all the leg work.
Sure I could probably have pinged James or suckered Cody into writing that merge commit but why?
What does that buy anybody?
It was so, so easy to just take James' changes and merge myself rather than asking someone else for trivial revisions.
And it makes for a better process for contributors.
It's not their fault that your project has grown merge conflicts with their changes.
If there had been a huge conflict, or James' changes had seemed somehow deeply unreasonable it would have been a different story.
But going the last mile for your contributors is worthwhile
Let's talk about another concept that's as old as the hills - code review.
"Design and code inspections to reduce errors in program development" [Fagan 1976] (pdf) introduced the notion for a structured process of reviewing programs & designs.
The central argument which Fagan presents is that it is possible to quantitatively review software for flaws early in the development cycle, and to iterate on development while the cost of change is low compared to cost of iterating on software which had been deployed to customers.
The terminology is a little archaic, but in all the elapsed time the fundamental idea holds.
Code review for Fagan is as much an architectural design review as it is anything else.
This shouldn't be terribly surprising, given some of the particular concerns Fagan's process is designed with addressing.
While many of these things haven't intentionally changed, some of these concerns such as the specifics of register restoration reflect the paper's age.
While the underlying goal of the code review process, to examine software for flaws early and often, has not changed meaningfully in the intervening decades many of the particulars of process described by Fagan reflect a rigidity of process and a scale of endeavor which is no longer reflective of the state of industry at large.
Fagan's process is designed to prevent architecture level mistakes through intensive review, as well as to detect "normal" bugs en-masse and provide a natural workflow for iterative searching for and fixing of bugs until the artifact is deemed of sufficient quality.
This is a work of process engineering optimized for code being slow & difficult to write, and for software being slow & risky to ship.
So what does a modern code review system look like?
What makes for a good code review?
What has changed?
With the cheapening of computer time, advent of integrated testing systems, generative testing, continuous integration and high level languages, many of the properties which previously required extensive deliberate human review can now be automatically provided.
Likewise, modern linters & formatters can provide extensive stylistic criticism and enforce a degree of regularity across entire codebases.
Continuous delivery systems and incremental deployment methodologies also serve to mitigate the expensive "big bang" releases which informed Fagan's process.
Continuous or near continuous delivery capabilities mean that teams can be more focused on shipping & testing incremental products.
Artifacts don't have to be fully baked or finalized before they are deployed.
Similarly, linters & other automatic code inspection together with the advantages of high level languages at once make it possible to make meaningful change to artifacts much more rapidly and to automatically detect entire classes of flaws for remediation before an author even begins to engage others for review.
Ultimately, the job of every team is to deliver software.
In many contexts, incomplete solutions, delivered promptly & iterated on rapidly, are superior to fuller solution on a longer release cadence.
What does this mean for code reviews?
Reid's Rules of Review
True to the style of The Elements of Style, these rules are hard, fast and have exceptions.
They're deeply motivated by the tooling & engineering context described above.
If your team has missile-launching or life ending reliability concerns, you'll want a different approach.
If you can't trivially test or re-deploy or partially roll out your code you'll also want a different approach.
This is just the way I want to work & think people should try to work.
1. Ensure that the artifact is approachable.
If you are not able to understand a changeset or a new artifact, let alone the process by which its author arrived at the current choice of design decisions & trade-offs, that is itself a deeply meaningful criticism because it means that both the code is unclear and the motivational documents are deeply lacking.
As the reviewee the onus is on you to enable your reviewers to offer high level criticism by removing low level barriers to understanding.
1.1. Corollary: Write the docs.
As the reviewee, how are your reviewers supposed to understand what problem(s) you're trying to solve or the approach you're taking if you don't explain it?
Link the ticket(s).
Write docstrings for your code.
Include examples so that it's obvious what and how.
Write a meaningful documentation page explaining the entire project so that the why is captured.
1.2. Corollary: Write the tests.
Those examples you wrote?
They should be tests.
I'm totally guilty of the "It's correct by construction!
Stop giving me this tests pls crap!"
but it's a real anti-pattern.
As the reviewee even to the extent that you may succeed in producing or composing diamonds, someone will eventually come along and refactor what you've written and if there aren't tests covering the current behavior who knows what will happen then.
You may even be the person who introduces that regression and won't you feel silly then.
Furthermore tests help your reviewers approach your code by offering examples & demonstrations.
Tests aren't a replacement for documentation and examples, but they certainly help.
1.3. Corollary: Run the linter.
If you have a style guide, stick to it in so much as is reasonable.
As the reviewee if you've deviated from the guide to which your coworkers are accustomed you've just made it harder for your coworkers to meaningfully approach and criticize the changes you're proposing.
You've decreased the review's value for everyone involved.
2. Criticize the approach.
Algorithmic or strategic commentary is the most valuable thing you can offer to a coworker.
Linters and automatic tooling can't really help here.
Insight about the future failings of the current choice of techniques, benefits of other techniques and available tools can all lead to deeply meaningful improvements in code quality and to learning among the team.
This kind of review may be difficult to offer since it requires getting inside the author's head and understanding both the problem and the motivations which brought them to the decisions they made, but this can really be an opportunity to prevent design flaws and teach.
It's worth the effort.
3. Don't bike shed your reviewee.
If the code works and is close to acceptable, leave comments & accept it.
The professional onus is on the reviewee to determine and make appropriate changes.
It's not worth your or their time to go around and around in a review cycle with a turn around time in hours over this or that bikeshed.
3.1. Corollary for the reviewer: Style guides are something you apply to yourself, not something you do to others.
⚠⚠ EXCEPT IN THE MOST EGREGIOUS CASES ⚠⚠
If someone clearly threw the style guide out the window or didn't run the linter, then a style guide review is appropriate.
Style guides should be automatically enforced, or if tooling is not available then they should be mostly aspirational.
What's the point of wasting two or more humans' time doing syntactic accounting for minor infractions?
If it takes half an hour to review a change, maybe another hour before the reviewee can respond to changes, half an hour or more to make the requested changes and then the updated changeset has to be reviewed again syntax and indentation bike sheds in code review can easily consume whole work days.
3.2. Corollary for the reviewer: Don't talk about performance concerns unless you have metrics in hand.
Need to push thousands of requests per second?
Yeah you may care about the performance of an inner loop somewhere.
Performance criticisms are meaningful and you should have performance metrics already.
Got a service that'll see a few hundred requests at the outside?
It can probably be quintic and still get the job done.
It is far better to write inefficient code which is easy to understand & modify, ship it and iterate.
If legitimate performance needs arise, code which can be understood and modified can always be refactored and optimized.
Code which is optimized early at the expense of understanding and modifiability is a huge mistake because much as it may make the reviewee or the reviewer feel clever to find some speedup, that speedup may or may not add value and the semantic cost of the optimizations increases the maintenance or carrying cost of the codebase as a whole.
3.3. Corollary for the reviewer: Don't be dogmatic.
There are exceptions to every rule.
There are concrete time and morale costs to requesting change.
Be mindful of these, and remember that you're all in the same codebase together with the same goal of shipping.
4. Hold your coworkers accountable after the fact and ship accordingly.
If their service is broken, they who broke it get to be on the front lines of fixing it.
The consequence of this is that you should trust your coworkers and prefer shipping their code thanks to a culture of ongoing responsibility.
Your default should be to accept code and ship code more or less whenever possible.
In short, don't be this guy
all code ends up as part of a negative diff at some point, some code earlier than others
In software, there is an ever present tempation to declare that something is finished.
To look upon an artifact, to pronounce it perfect, and to believe that it will persist unchanged for all time.
This is the model of "martian computing" which begat the Urbit project.
And it's wrong.
A specification is a precise description of what an entity is, typically written in terms of decomposition.
An abstraction is an attempt to describe an entity, or class of entities, in more general terms.
Where a specification will define precisely how something happens, an abstraction will merely state that it will happen.
Abstractions may be judged by their hardness -- that is, the strength of the invariants they enforce internally or provide externally, and those which they require but leave to their environment to ensure.
Some abstractions, like the idea of a generator or a stream, are weak in that they require little and provide little.
All the notion of a generator exports is a contract or pattern for getting more values and by which the source will signal when its end has been reached.
Yet this is a convenient model for the sequential consumption of any number of chunked sequential or eventual value sources which presumes nothing about how the values are generated.
We can define the abstraction of
filter :: (λ a → Bool) → [a] → [a]
in Haskell notation that's "filter is a function of an a function which returns a boolean for any a and a source of as to a source of as") to be x for x in xs if not f(x).
In Python, this exact formulation is an explicitly sequential generator which preserves the order of elements.
But what does filter actually have to do?
Does the order of elements matter?
When should an element's membership in the result be determined?
Does it matter?
Why would it matter?
The type of filter is part of the abstraction, but it is a weak contract compared to either of the operational formulations above.
Consider what other functions could be defined that satisfy the type signature λ (λ a → Bool) → [a] → [a] as above.
You could define a function which repeats the first element for which the provided function is true forever.
You could define a function which repeats the 2nd element for which the provided function is true only as many times as the are elements in the input sequence.
You could define a function which ignores the function argument and returns the input sequence.
You could define a function which ignores the function argument and returns the input sequence reversed.
And on and on and on and on.
A more precise definition of filter would be ∄x∈filter(f, xs) | f(x) is false.
(Note: to unpack the notation, that is "there is no x in filter(f, xs) such that f(x) is false")
This is a far better, more general abstraction.
At an operational semantics level, filter could shuffle.
It could operate in parallel on subsequences and return a parallel "first to deliver" concatenation.
It could be lazy or any manner of other things.
Let's consider another abstraction - the (first, next) or cons cell.
| first | next | -> | first | next | -> null
This is, honestly, a really bad abstraction because it's quite explicit about the details.
Heck the name "cons", "car" and "cdr" are all historical baggage.
However this is an abstraction.
It provides the notion of the first of a list, the next or rest of the list, and the end of the list being nil.
In doing so provides a model for thought to be sure, but it hides none of the details of the machine.
As processor core speed has outstripped memory access speed and as caches have become more and more important for circumventing the Von Neuman bottleneck, it has become a progressively less relevant abstraction because it is precise about machine details which are less and less appropriate to modern machines.
For this reason many Lisp family systems choose to provide what are referred to as CDR-optimized or chunked lists.
These are list-like structures wherein a number of value links are grouped together with a single next link.
| first | second | third | fourth | fifth | sixth | // | next | -> null
For instance a list of eight elements could fit entirely within a single chunk, and occupies a contiguous block of memory which provides more cache locality for linear traversals or adding elements to the end.
However, this chunked model makes splicing sub-lists, slicing, or explicitly manipulating next links expensive because the next link doesn't exist!
For instance if from (0, 1, 2, ..., 10) as a CDR₇ encoded list one were to try and slice out the sub-list [1...5], one could build a "sub-list" structure which refers to the substructure of the source list.
The instant one tries to alter a link pointer within the extracted sub-list, the entire sub-list must be copied so that there exists a link pointer to be manipulated.
However all these approaches to chunking, slicing, and manipulation still easily provide a common first, next, end sequence traversal abstraction.
So what does this mean about abstractions generally?
Abstractions are models for computation and are relevant in a context.
For instance, big-O analysis of an algorithm is an analysis of asymptotic performance with respect to an abstract machine.
It is not a precise analysis of the performance of the algorithm with respect to the average or worst cases on a physical machine.
These details, however, are the things which programmers care about.
O(N) could mean T(100*N) or T(N/2).
In order to be useful for practicing programmers, abstractions must eventually become more detailed than they must be as tools for proof.
It is not enough to know that f(xs) will be sorted; programmers are at least accustomed to expectations that f(xs) will occur in such and such time and space.
Were those expectations to be violated or suddenly change, program architecture decisions which presumed those performance properties would have to be revisited.
Church numerals are an interesting case of this mismatch between tools for thought and tools for implementation.
They're a useful tool for expressing abstract arithmetic and repetition in a proof divorced from any practicable machine.
You can express division, remainders, negatives, and even imaginary numbers this way.
Church numerals provide a natural representation for arbitrarily large values in the context of the lambda calculus.
But they're grossly mismatched with the realities of finite binary machines which working on fixed length bit vectors.
Bit vector machines can't capture the entire unbounded domain of Church numerals.
But we can't build a machine which can perform arithmetic on Church numerals with the same performance of a bit vector machine.
It's fundamentally a trade-off between a tool for thought and a tool for implementing and reasoning about a physical machine.
This pattern has consequences for the decisions we make when designing software.
It may be hubristically tempting to conceive of the artifacts we develop as generation ships;
construct which will long survive us without significant structural change if we but exercise appropriate art and find the right Martian gem, reality is far less forgiving.
Rarely is there a diamond-hard artifact so divorced from business concerns that it can adequately weather the ravages of time unchanged.
Rather than seek such gems -- or, in failing to produce such a gem, making an excessive number of trade-offs -- good software engineering should be characterized by using and producing a number of small abstractions.
Small abstractions are advantageous because they provide little and expose little, thus involving a minimum number of externalities and minimizing vulnerability to crosscutting concerns.
In order to build a system of any size or complexity, composing several such abstractions is required.
If, due to a change in externalities, one or several such small abstractions become inappropriate, replacing a small abstraction, in the worst case, involves no more impact to the system as a whole than replacing a larger -- or, worse, monolithic (no) -- abstraction.
Due to small changing surface area it is likely that reuse between the initial and successor system states will be maximized and the cost to transition the system will be lower than if there were a large or so-large-as-to-be-no abstraction which must be replaced almost entirely.
Write better software by decoupling.
Seek to prevent or at least minimize crosscutting concerns.
Take advantage of control flow abstraction and compose abstractions together.
Mars doesn't have a monolithic diamond.
It is a field of small gems.
this document is largely a product of an evening arguing with @ztellman, being fixated and barely sober enough to write when I got home.
@argumatronic egged me on to finish this and subsequently was kind enough to copy edit early versions of this for me.
I've been told that many of the ideas appearing here will soon appear in his book, and wanted to note that for the most part Zack got here first.