Puppet, I guess

So I have, or rather should say had, a problem.

At work, I'm partially responsible for managing a large puppet deployment. Puppet is largely configured in terms of providers packaging resources & templates which are used to lay down configuration files on disk to manage host behavior. It is after all a configuration mangement tool. And it works great for the problem domain of "I have lots of servers with lots of state to configure".

But I have dotfiles. The way that I manage dotfiles needs to capture the idea that I want to merge many different components together into the state of the given machine. Sounds like puppet...

But my dotfiles have to be self-bootstrapping, and they can't presume access to any particular master node. These two requirements are actually pretty firm, because whenever I bring up a new laptop or a new VPS it's basically clone down my dotfiles, run an install script in the repo and done.

GNU Stow has served me really well so far. It totally solved my initial dotfiles problem, which was "I want to just symlink a bunch of stuff out of a repo".

When I wanted to start organizing the repo, stow became less ideal and I wound up with a wrapper script around stow which had some notion of sets of repository delivered stow packages to install on a given host. And that worked for quite a while on my personal machines.

Then I had a problem at work. I use ~/.zsh/conf/* as a holding pen for a bunch of configuration files regarding variables, functions and general shell bringup. When I was only configuring my personal machine(s), stowk worked fine because I had the same settings everywhere. Stow installs a whole zsh package from the repo, which includes a .zsh and a .zsh/conf and you're all set.

The problem came when I realized that some of my configurations only apply to my work machine. Because I use a Mac at work, I have some extra stuff I need to do with the path and homebrew to get things... quite right and I don't want to burden my Linux install(s) with all that. A monolithic zsh package won't cut it anymore.

But Stow only deals with packages consisting of whole directories! It won't let you manage single files! Really what I want is not a stow which lays down symlink directories, but a tool which makes real directories and symlinks in files. That way I could write several packages, say zsh and work_zsh which would merge together into the ~/.zsh tree the way I want.

So lets start laying out files on disk.

function arrdem_installf () {
  # $1 is the logical name of the file to be installed
  # $2 is the absolute name of the file to be installed
  # $3 is the absolute name of its destination
  #
  # Installs (links) a single file, debugging as appropriate

  if [ -n "$DEBUG" ]; then
      echo "[DBG] $1 -> $3"
  fi
  ln -s "$2" "$3"
}

export -f arrdem_installf

Okay so that's not hard. Just requires ln, makes appropriate use of string quoting and has support for a debugging mode. We're also gonna need a thing to lay out directories on disk.

function arrdem_installd () {
  # $1 is an un-normalized path which should be created (or cleared!)

  dir="$(echo $1 | sed 's/\.\///g')"
  if [ ! -d "$dir" ]; then
      if [ -n "$FORCE" ]; then
          rm -rf "$dir"
      fi

      mkdir -p "$dir"
  fi
}

export -f arrdem_installd

Okay so this function will accept an un-normalized path (think a/./b/c), normalize it (a/b/c) and ensure it exists. Because we're interacting with a file/directory tree which Stow used to own, it's entirely possible that there used to be a symlink where we want to put a directory. So we have to support a force mode wherein we'll blow that away.

Alright... so now let's write a stow alternative that does what we want.

function arrdem_stowf () {
  # $1 is the stow target dir
  # $2 is the name of the file to stow

  f="$(echo $2 | sed 's/\.\///g')" # Strip leading ./
  TGT="$1/$f"
  ABS="$(realpath $f)"

  if [ -h "$TGT" ] || [ -e "$TGT" ]; then
      if [ "$(realpath $TGT)" != "$ABS" ]; then
          if [ -n "$FORCE" ]; then
              if [ -n "$DEBUG" ]; then
                  echo "[DBG] Clobbering existing $ABS"
              fi
              rm "$TGT"
              arrdem_installf "$f" "$ABS" "$TGT"
          else
            echo "[WARNING] $TGT already exists! Not replacing!"
            echo "          Would have been replaced with $ABS"
          fi
      else
        if [ -n "$DEBUG" ]; then
            echo "[DEBUG] $TGT ($f) already installed, skipping"
        fi
      fi
  else
    arrdem_installf "$f" "$ABS" "$TGT"
  fi
}

export -f arrdem_stowf

So if we just want to install a single file, we normalize the file and compute the path we want to install the file to. Now it's possible since this is a symlink based configuration management system that the target file already exists. The existing file could be a deal (old) symlink, or it could be a link we already placed on a previous run. We can use realpath to resolve symlink files to their paths, and determine whether we have a link that already does what we wanted to do. In the case of an existing file and $FORCE we'll clobber, otherwise we'll only install a new link if there isn't something there.

Great so this deals with installing files, assuming that we want to map from ./foo to ~/foo.

Now we can really write our stow, which will eat an entire directory full of files & subdirectories and emplace them all.

function arrdem_stow () {
  # $1 is the install target dir
  # $2 is the source package dir
  #
  # Makes all required directories and links all required files to install a given package.

  # If a target directory doesn't exist, create it as a directory.
  # For each file in the source, create symlinks into the target directory.
  #
  # This has the effect of creating merge mounts between multiple packages, which gnu stow doesn't
  # support.
  ( cd "$2"

    # Make all required directories if they don't exist
    #
    # If force is set and something is already there blow it the fsck away
    find . -mindepth 1 \
         -type d \
         -exec bash -c 'arrdem_installd "$0/$1"' "$1" {} \;

    # Link in all files.
    #
    # If the file already exists AND is a link to the thing we want to install, don't bother.
    # Else if the file already exists and isn't the thing we want to install and force is set, clobber
    # Else if the file already exists emit a warning
    # Else link the file in as appropriate
    #
    # Note that this skips INSTALL, BUILD and README files
    find . -type f \
         -not -name "INSTALL" \
         -not -name "BUILD" \
         -not -name "README.*" \
         -exec bash -c 'arrdem_stowf "$0" "$1"' "$1" {} \;
  )
}

export -f arrdem_stow

It turns out that the ONLY really bash safe way to support filenames and directory names containing arbitrary whitespace or other characters is to use find -exec bash, rather than parsing the output of find. If you try to parse find's output, you wind up having to designate some character as special and the delimeter. I thought that whitespace was a safe assumption and found out I was wrong, so wound up taking this approach of using the -exec option to construct recursive bash processes calling my exported functionss. Which is why I've been exporting everything all along.

So given ~ as the install target, and ./foo/ as the package to install, we'll cd into foo in a subshell (so we don't leap CWD state), find & arrdem_installd all the required directories. THen it will arrdem_stowf all the files, with a couple exceptions. README fiiles, BUILD and INSTALL files are exempted from this process so we don't litter ~ with a bunch of files which aren't logically a part of most packages.

Okay. So we can install packages. Great.

This lets me build out a tree like

./install.sh
./zsh/README
./zsh/.zsh/.....
./zsh-work/README
./zsh-work/.zsh/.....
./emacs/README
.....

Which will get the job done, but still leaves me with the problem of picking and choosing which packages to install on a given host. I can solve this problem by going full puppet, and defining a concept of a profile, which consists of requirements of other profiles or packages. When I ./install.sh on a host, it's gonna try to install the profile named $(hostname) first, falling back to some default profile if I haven't built one out for the host yet.

So really the tree will look like

./install.sh
./packages.d/emacs
./packages.d/vim
./packages.d/zsh
./packages.d/work-zsh
./profiles.d/work
./profiles.d/work/requirements
./profiles.d/home
./profiles.d/home/requirements
./profiles.d/default
./hosts.d/$WORK_HOSTNAME
./hosts.d/$WORK_HOSTNAME/requirements
./hosts.d/$LAPTOP_HOSTNAME
./hosts.d/$LAPTOP_HOSTNAME/requirements

where requirements files will be special and let a profile list out the other profile(s) and package(s) which it should be installed with. This lets me say for instance that the profiles.d/work profile is defined to be profiles.d/home more the package work-zsh for instance. Or that rather, profiles.d/work is profiles.d/default more a bunch of stuff and profiles.d/home is entirely independent and includes configuration(s) like my Xmonad setup which are irrellevant to a Mac.

So first we need to be able to install something we consider to be a package

function install_package() {
  # $1 is the path of the package to install
  #
  # Executes the package build script if present.
  #
  # Then executes the install script if present, otherwise using arrdem_stow to install the built
  # package.

    echo "[INFO - install_package] installing $1"

    if [ -x "$1/BUILD" ]; then
        ( cd "$1";
          ./BUILD)
    fi

    if [ -e "$1/INSTALL" ]; then
        ( cd "$1";
          ./INSTALL)
    else
        arrdem_stow ~ "$1"
    fi
}

export -f install_package

Packages are directories which may contain the magical files README, BUILD and INSTALL. If there's a BUILD file, execute it before we try to install the package. This gives packages the opportunity to do host-specific setup, such as compiling fortune files with ctags.

The INSTALL script gives packages an escape hatch out of the default package instalation behavior, say installing OS packages rather than emplacing resources from this directory.

Otherwise, we just tread the directory as a normal stow package and install it.

Okay, so now we need to support profiles.

function install_profile() {
  # $1 is the path of the profile to install
  #
  # Reads the requires file from the profile, installing all required profiles and packages, then
  # installs any packages in the profile itself.

  if [ -d "$1" ]; then
      echo "[INFO] installing $1"

      # Install requires
      REQUIRES="$1/requires"

      if [ -e "$REQUIRES" ]; then
          cat $REQUIRES | while read -r require; do
              echo "[INFO] $require"
              case "$require" in
                  profiles.d/*)
                      echo "[INFO - install_profile($1)] recursively installing profile $require"
                      install_profile "$require"
                      ;;

                  packages.d/*)
                      echo "[INFO - install_profile($1)] installing package $require"
                      install_package "$require"
                      ;;
              esac
          done
      fi

      # Install the package(s)
      find "$1" \
           -maxdepth 1 \
           -mindepth 1 \
           -type d \
           -exec bash -c 'install_package "$0"' {} \;
  else
    echo "[WARN] No such package $1!"
  fi
}

So if there is a directory with the given profile name, then if there is a requires file, pattern match profiles & packages out of the requires file & install them. For good measure, install any packages which may be included in the profile's directory.

Now we just need a main to drive all this.

function main() {
  # Normalize hostname

  HOSTNAME="$(hostname | tr '[:upper:]' '[:lower:]')"
  HOST_DIR="hosts.d/$HOSTNAME"

  if [ -d "$HOST_DIR" ]; then
      # Install the host profile itself
      #
      # It is expected that the host requires default explicitly (or transitively) rather than getting
      # it "for free".
      echo "[INFO - main] installing profile $HOST_DIR"
      install_profile "$HOST_DIR"
  else
    # Otherwise just install the default profile
    echo "[INFO - main] installing fallback profile profile profiles.d/default"
    install_profile "profiles.d/default"
  fi
}

main

At this point we've built out a shell script which depends only on find, bash and realpath but can support some really complex behavior in terms of laying down user config files. As hinted above, this could install OS packages (or homebrew).

By making heavy use of foo.d directories, it becomes super easy to modularize configurations into lots of profiles & merge them together for emplacement.

Best of all in debug mode it becomes pretty easy to sort out what's comming from where with a grep, or you can just stat the emplaced symlin(s) which will give you a fully qualified path back to the resouces they alias.

Not bad for a one-day garbage puppet implementation.

The code for this monstrosity is available here as a gist , but comes with a disclaimer that it's a snapshot of the working state of my dotfiles repository as of this article's writing and may be suitable for no purpose including my own usage.

^d