Deprecation warnings in Jaunt

This week I'm working on getting Jaunt's 1.9.0 release nearer the door, on which note I'm happy to properly demo one the features of this first release: deprecation warnings.

A bare REPL will do for these demos.

$ cd $(mktemp -d)
$ wget https://clojars.org/repo/org/jaunt-lang/jaunt/1.9.0-RC4/jaunt-1.9.0-RC4.jar
$ java -jar jaunt-1.9-RC4.jar

Jaunt #2 (not my best ticket ever in terms of hygiene) was the first work I did on what became Jaunt and was really pretty critical for getting my teeth into the project and validating that there were tractable incremental improvements to be made over Clojure.

This change introduced four new switches to the Jaunt compiler, the most interesting of which is :warn-on-deprecated which enables or disables compiler warnings in support the use of ^:deprecated metadata on functions and namespaces. :warn-on-deprecated is on (true) by default, although it can be disabled globally with the JVM system property clojure.compiler.warn-on-deprecated=false, or by set!ing clojure.core/*compiler-options* say to have (set! *compiler-options* (dissoc *compiler-options* :warn-on-deprecated)) in a namespace where you want these checks to be disabled.

The semantics of deprecation are simple:

  • A namespace is deprecated if and only if it has the ^:deprecated metadata.
  • A definition (Var) is deprecated if either it occurs within a ^:deprecated namespace, or is itself marked ^:deprecated.

Vars and namespaces may become deprecated in any SemVer minor version or major version, but may only legally be deleted only on a major version.

Deprecation warnings are only emitted when a deprecated namespace or definition is accessed from a context which is not deprecated. Deprecated contexts are namespaces which are deprecated, and the bodies of deprecated definitions.

So for instance, in the following code:

(defn ^:deprecated bad-idea [x]
  (println "Oh noez!")
  (throw (new Exception "Bad code is bad")))
;; => #'user/bad-idea

(defn ^:deprecated also-bad-idea [x]
  (bad-idea x))
;; => #'user/also-bad-idea

(defn almost-better-idea [x]
  (bad-idea x))
;; Warning: using deprecated var: #'user/bad-idea ...
;; => #'user/also-bad-idea

The definition of bad-idea itself is innocuous. Simply evaluating deprecated definitions is fine. Warning that also-bad-idea uses bad-idea would be superfluous, because that fact is innocuous so long as neither is used. Suppressing warnings in this case allows Jaunt to load arbitrarily much deprecated code without drowning a user with false positive warnings.

Users should only have to care about deprecation at the edge between current code and deprecated code. Thus compiling almost-better-idea will emit a warning that it makes use of bad-idea, since almost-better-idea is not itself deprecated. Likewise at the repl invoking either deprecated function would also generate a warning.

As to namespaces, similar principles apply. Consider the two namespaces and trace:

(ns ^:deprecated com.proj.code)
;; => nil

(def some-const 3)
;; => #'com.proj.code/some-const

;;------------------------------------------------------------------------
(ns com.proj.new-code
  (:require [com.proj.code :as old :refer [some-const]]))
;; Warning: aliasing deprecated ns: com.proj.code  ...
;; Warning: referring deprecated var: #'com.proj.code/some-const ...
;; => nil

(def g 4)
;; => #'com.proj.new-code/g

(def h (+ old/some-const g))
;; Warning: using deprecated var: #'com.proj.code/some-const ...
;; => #'com.proj.new-code/h

As the whole com.proj.code namespace is deprecated, merely referencing it and requiring that it be loaded is cause for a warning. Should that namespace ever be removed, the require would break regardless of whether anything from the required namespace is used or not.

If symbols which are deprecated are referred into a namespace which is not deprecated, each one of those will generate a warning as well for the same reason. However Jaunt tries to be smart about this, suppressing such warnings if a (require ' [... :refer :all]) or a (use ...) directive has been issued which constitutes an indefinite referral.

The symbol com.proj.code/some-const, is deprecated by dint of being defined in a deprecated namespace. Consequently a warning is emitted when it is used outside of a deprecated context, here in the definition of com.proj.new-code/h.

One last demo, disabling warnings!

(ns com.proj.no-warnings)
;; => nil

(set! *compiler-options* 
  (dissoc *compiler-options* :warn-on-deprecated))
;; elided...

(def ^:deprecated a 3)
;; => #'com.proj.no-warnings/a

(def b (+ a 4))
;; => #'com.proj.no-warnings/b

There's more than just this coming down the line in Jaunt, so if you're interested check out the 1.9 CHANGELOG, watch the repo or stay tuned here for more demos. Issues, ideas and especially pull requests are welcome :D

^d

Jaunt - A friendly Clojure fork

TL;DR

I've started a fork of Clojure which I intend to operate in a community directed manner; Jaunt. Check out the demo script or just keep reading for more.

The Long Version

After several months of dinking with Oxlang on and off I came to a realization. Ox, while an annoyingly persistent daydream was doomed to be one of two things. Either it would be something akin to Haskell in S expressions, or it would be indistinguishable from Clojure with some tweaks under the hood.

I'd gotten Ox's reader sort of working, and was starting to experiment with notations, explore notions of higher kinded types and generally was having a ball. However even getting that far was a highly educational exercise in how much Clojure already does fairly well. To continue down the road of Ox was to admit that I'd have to reinvent all of Clojure's datastructures, most of the Clojure compiler and a whole boatload of the standard library. It was also to admit that due to limitations of the JVM as a target, I wouldn't be able to play some of the many tricks which the GHC folks get to play even were I to go down the road of a pure language.

Why do all this? Because it's hard? I'm no Heracles seeking labors or yaks for their own sake. I'd rather be learning something new or playing Dota or... a thousand other things.

I swear I would.

Besides, even though Michiel Trimpe was kind enough to mention Ox alongside a number of other "conversational" programming languages in his ClojureX talk, I'm not convinced that it's a good idea.

In looking back at that particular talk, one thing struck me. Sure in a pure functional programming language such as Haskell you could stick an arbitrary AST in a Merkle tree and have a globally distributed version control and distribution system with strong guarantees about tree uniqueness and reuse safety.

Unfortunately, Clojure is no such beast. We have no language level concept of effects (okay we have the io! macro but that's hardly the same). While you could implement immutable namespaces (and in fact they are backed by immutable maps already) the concept doesn't make a whole lot of sense to me.

The very notion of a conversational language is that you can reference any definition from any version of the program as if it were the current version. What does this look like at a REPL? Are namespaces (or rather full states) identified with a commit ID and compilation occurs in terms of a fully qualified "commit" and Var? Now we need notation for that. Maybe version/name/namespace or something. Some sort of explorer to browse older commits is in order... the list goes on and on. In short, I think you wind up reinventing a bunch of stuff that git already does pretty well.

This is not to say that Clojure's existing mutable namespaces are ideal. For me, they work against the ns or file level code loading style I use when developing Clojure code. Old habits from C die hard as it were. A common failure mode I encounter is that I'll rename a function but not all of its uses. IntelliJ and refactor-nrepl can be of some help here, but I've had them break down and I'm not always in a "full ide" configuration. Because namespaces are mutable, even if I were to perform a rename correctly, the old name is still present in the namespace. If I got it wrong, my code reloads, all symbols resolved then I get Really Cool Unexpected Behavior when parts of my code call the new function and parts call the old function. Or worse my code works fine until I restart my repl or the test server and everything explodes. Or the old name just sticks around cluttering up my autocompletion.

This becomes a real UX problem for me, because I rely heavily on refactoring. I start by sketching out what amount to types, or deriving operations from types I already have, and keep working renaming and moving code around until I get something I'm happy with.

And Yet I Shave

The thing which came to me now a few weeks ago is that there's an interesting halfway house between these two extremes of mutable and immutable namespaces. What if we gave Namespaces and Vars version numbers? When a Namespace is re-defined (the ns form is evaluated) then the Namespace's version is incremented. When a def is evaluated, the version of the Var bound by the def is set to be the version of the Namespace in which that Var exists.

This allows traditional REPL style interaction with Namespaces. You may enter a Namespace, add bindings, aliases, imports and so forth as you will. However the version numbers give you important static information about the state of the Var in its Namespace.

If I have some def which I entered at the REPL, when I do something to ns backing file and reload it, that def persists. But, its version is now out of sync with the Namespace it exists within. When we reloaded the file, the Namespace's version was incremented and then every definition was reevaluated. If my code is still using an old symbol which is not textually present in the file and was was not reloaded, its version doesn't increase. Now the compiler can detect the version disparity and emit warnings that I'm using old code.

While this solves dependency freshness issues within a single def and a single Namespace, solving them globally requires more leg work. The compiler needs to be adapted to track the use and reach sets of each compiled expression, and to inter that information into the Var binding forms so that it's possible for a newly compiled form to emit warnings about transitive use of stale vars.

These changes don't alter the semantics of the Namespace at all outside of reloading. If you evaluate a new def or a single form in a Namespace it works just as before. However if you take this whole file reloading approach, the language can offer added support which papers over a rather painful pitfall.

A related change is that Namespaces can be made to purge their imports and aliases on reload. Right now, Namespaces persist imports and aliases until the VM is restarted. This leads to a nasty set of refactoring problems where you want to change the name of an alias, but can't reuse a name because aliases are never cleared. If we consider a whole Namespace and its source file(s) as the unit of compilation we can again do better. This is another language level change which would meaningfully improve my day to day use of the language without breaking existing use patterns.

Both of these changes turn out to be quite simple to implement. In the course of only a few days I went from the concept of these changes to a build of Clojure (pr, build) which features Namespace clearing and Var versions. I'm using now daily because it better supports the way that I want to work with the language.

Waxing Political

Rich, Stuart Sierra, Stu Halloway, Alex and the rest of the Clojure/Core crew (see Clojure/Huh?) built Clojure and have shepherded it this far. I'm immensely grateful to them for the work they've done to date in building a tool which I've had great fun using. But I think our priorities diverge.

From what I've been able to gather by talking to people who've been around longer and what Rich has written, it should be clear that Clojure is not a community project in the same way Python or Rust is. Clojure is Rich's personal project, which he is so kind as to share with the rest of us and in in its refinement Rich chooses to accept some amount of input from us as users. I'm somewhat ashamed to admit that it took me two years to realize this.

Discussion of particulars, and the reasons for the status quo is I think a somewhat useless exercise. Rich has his project and administers it as he sees fit. As Clojure is Rich's project, the priority of its governance is to conserve Rich's time and it has repeatedly been made clear that there is no desire to change this. This is an eminently reasonable goal I can find no fault in. Rich doesn't owe me or anyone else jack, let alone time. However the consequence of this structure and of not delegating is the slow and steady path which frustrates my desires for more rapid iteration.

A Fork in the Road

With loyalty the path of the last 18 months and clearly one of frustration, and voice ineffective due to the (reasonable!) priorities of Clojure/Core, The only course of action left to me is some form of exit.

I think that there's a lot of possible improvements that can still be made to the language. Currently there doesn't seem to be a lot of appetite for exploring that in Clojure itself. The namespace reloading stuff sketched above is just the first thing that popped to mind when I started thinking about how to improve my day to day experience and implementing that alone has shaken loose a number of other small but worthy improvements.

The consequence of this opinion is that I've started Jaunt.

Jaunt is my personal fork of Clojure which seeks to deal with the various organizational and technical limitations expressed above.

Technically, with Jaunt I want to explore a space of refinements to the language such as the Namespace reloading changes sketched above which add value without breaking compatibility with the ecosystem of libraries and tools like CIDER which I've come to rely on every day.

Organizationally, I want Jaunt to be a community driven affair. I don't claim that my judgment is infallible, nor do I think I'll have all the good ideas, nor will I have infinite time to consider changes even were these restrictions lifted. I'd love to involve contributors who want to bring improvements to the project within the stated bounds of preserving compatibility at least with the documented parts of Clojure.

Demo

Because otherwise how do you believe that any of this works, here's a script that'll build a empty leiningen project in a temporary directory and run a quick demo of some of the stale Var stuff.

No Promises

If you want a stable language. If you're running code in production. If you don't want to get down and dirty with the language implementation or the development process, stick with Clojure.

As the EPL states:

THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF ... FITNESS FOR A PARTICULAR PURPOSE.

Jaunt should be a drop in replacement for Clojure. I want to keep it compatible. I foresee no reason to break the documented Clojure API. But some amount of drift is inevitable. I've already moved some stuff around inside the Java implementation. If you were depending on those undocumented yet public implementation details, all bets are off. I've changed how metadata behaves on vars (I think it's a better approach but it does impose behavioral differences). I've added deprecated warnings. I added clojure.core/*line* and clojure.core/*column* (pr) which the compiler totally had internally and just didn't expose forcing weirdness. And this list of differences (perhaps improvements) is just going to grow.

Whether Jaunt proves to be something I depend on in the future, whether I wander off into the land of types and leave it to someone else or whether it diverges too much from Clojure and dies as a result of the weight of its mutations, I've overheard enough interest in a project such as this that I think this is a worthwhile endeavor and I hope you'll join me in this little adventure.

Update: Clojarr has been renamed to Jaunt, updated article accordingly.

^d

XML and Other Horrors

Those of you who follow me on Twitter may be aware of my recent travails with a bogeyman I've named only as "XML". Having banged my head against this abomination of software for several days, it's venting time so buckle in for the ride.

The Hell Am I Building

The summer I was working at Calxeda, my friends @frozenfoxx and @Sweet_Grrl got me hooked on a tabletop game, Warmachine by the Privateer Press aka PP. At a high level since PP's digital presence is pretty last decade, Warmachine is a tabletop miniature army game in the 2.5cm to 12cm base diameter size range. Models and units of models have statistics in the classical D&D style, and players engage in deathmatch and king of the hill style games.

Compared to Warhammer 40k which is probably better known, Warmachine features a much more steampunk theme & universe, simpler rules of play, lower entry price point and a greater focus on balance of play rather than on army construction and model specialization.

In short, Warmachine is a player's game while Warhammer is a modeler's game.

Unfortunately, a few realities of this game genre assert themselves.

  1. Games are slow and take up a lot of space
  2. Games are only possible at the other player's convenience
  3. Exploring game balance questions is difficult b/c play is time-hard
  4. You've got to buy before you try much of the time

As a student with homework, I simply haven't been able to attend Austin's various weekly Warmachine evenings in about a year which means it's a long time since I've gotten to play a game I enjoy a lot. Unlike Starcraft or Dota I can't just drop into a game at my convenience. But what if I could?

I've previously messed with AIs for Magic: the Gathering, so the idea at some point asserted itself that I could (or even should) try to build a full up simulator for Warmachine IP issues be damned.

The What

There are two "moving pieces" to this project. Or maybe prior art is the better term. Sebastien Laforet, here and after referred to as the Frenchman, built an Android app which is capable of rendering data files encoding the rules of play and model statistics for the game. Great! Someone else already typed all that in for me. In English, but typed it in. These files are available on github and are "just" XML. More on this in a bit.

Two other people, "Hobo" and "PG_Corwin" build a tile set for the VASSAL tabletop simulation engine. Unfortunately VASSAL is just a sprite engine with no understanding of the rules or this whole project would be moot. Treating the module as a JAR yields a whole bunch of sprites in the /images/ tree, so there's "free" art to work with as well.

The Devils in the Data

In theory this should be dead easy. The rules files are a full enumeration of every known piece in the game, and the sprites package has an image for most of them, so just wire them together into a data store of some sort and I'd be done from an assets point of view. I'd just need to well build the graphics engine and figure out how to simulate the ruleset.

Unfortunately this isn't so simple for a couple reasons.

First of all, the XML files Sebastien put together are designed to do one thing and one thing only: encode the rule text on the playing cards associated with each model and unit. Cards are generally formatted with an image, some fluff and statistics on the front and rules on the back. So for instance this recently released model, Coleman Stryker v3 has a model and stat card face as such:

So we can slap all this in XML and it works just fine (with serious elisions for length)

<warcaster id="Yz02"
           name="Stryker3"
           full_name="Lord General Coleman Stryker"
           qualification="Cygnar Epic Cavalry Warcaster"
           focus="6"
           warjack_points="5"
           fa="C"
           completed="true">
  <basestats name="Stryker"
             spd="8" str="6" mat="7" rat="6" def="15" arm="16" cmd="10"
             hitpoints="18" immunity_electricity="true" />
  <weapons>
    <melee_weapon name="Quicksilver MKIII"
                  pow="8" p_plus_s="14"
                  magical="true" reach="true">
      <capacity title="DISRUPTION">
        ....
      </capacity>
    </melee_weapon>
    <mount_weapon name="Mount" pow="10">
      <capacity title="THUNDEROUS IMPACT">
        ....
      </capacity>
    </mount_weapon>
    <ranged_weapon name="Quicksilver Blast"
                   rng="8" rof="1" aoe="-" pow="14"
                   magical="true" electricity="true">
      <capacity title="DISRUPTION">
        ....
      </capacity>
    </ranged_weapon>
  </weapons>
  <feat title="LIGHTNING CHARGE">
    ....
  </feat>
  <spell name="ARCANE BOLT"
         cost="2" rng="12" aoe="-" pow="11" up="NO" off="YES">....</spell>
  <spell name="CHAIN BLAST"
         cost="3" rng="10" aoe="3" pow="12" up="NO" off="YES">....</spell>
  <spell name="ESCORT"
         cost="2" rng="SELF" aoe="CTRL" pow="-" up="YES" off="NO">....</spell>
  <spell name="FURY"
         cost="2" rng="6" aoe="-" pow="-" up="YES" off="NO">....</spell>
  <spell name="IRON AGGRESSION"
         cost="3" rng="6" aoe="-" pow="-" up="YES" off="NO">....</spell>
  <capacity title="ELITE CADRE [STORM LANCES]" type="">....</capacity>
  <capacity title="FIELD MARSHAL [ASSAULT]">....</capacity>
  <capacity title="PLASMA NIMBUS">....</capacity>
</warcaster>

So we have some meaning for a <warcaster> which is a title and a specialization of <model>, we have <basestats> which is common to pretty much everything, we have a <weapons> block with a list of <weapon> entries each of which has some attributes (rules) named <capacity> and this is all pretty sane. Writing a set of functions which can walk this tag tree using clojure.xml was really simple, and worked great, and all you have to do is reduce over all this tag soup with some bindings and a state object to build a representation of a given model.

That code has been working for about 12 months, because shortly thereafter batteries of special cases appeared from the woodwork and fouled my initial efforts on this up.

See as in Magic: the Gathering, Warmachine derives a lot of well value from novelty and from mixing things up in the ruleset. This means that it's not uncommon to find things which aren't what they appear to be. The newest faction in the game has this awesome model:

The text at the bottom "Battle engine and solos" is the important bit. The TEP itself is what is called a Battle Engine and has a whole battery of rules associated with that concept. However it comes in a box with three other models which aren't part of this Battle Engine entity, they are independent during play hence the use of the word Solo. So... what the hell is this thing and how do we represent it?

Well this is our Frenchman's answer, again with omissions:

<battleEngine id="cocE01"
              full_name="Transinfinite Emergence Projector & Permutation Servitors"
              cost="9" qualification="Convergence Battle engine" fa="2"
              completed="true">
  <basestats spd="5" str="10" mat="0" rat="4" def="10" arm="19" cmd="10"
             hitpoints="20" construct="true" pathfinder="true"/>
  <weapons>
    <ranged_weapon name="Aperture Pulse"
                   rng="SP10" aoe="-" pow="10" rof="1" location="-">
      <capacity title="AUTO FIRE[2]">....</capacity>
      <capacity title="FIRING FORMULAE">....</capacity>
    </ranged_weapon>
  </weapons>
  <capacity title="GUN PLATFORM">....</capacity>
  <capacity title="SACRIFICIAL PAWN [PERMUTATION SERVITORS]">....</capacity>
  <capacity title="SERVITOR SATELLITES">....</capacity>
  <capacity title="STEADY">....</capacity>
  
  <model id="Permutation Servitors" full_name="Permutation Servitors">
    <basestats spd="6" str="3" mat="5" rat="5" def="12" arm="13" cmd="0"
               construct="true" pathfinder="true"/>
    <capacity title="ORBIT">....</capacity>
    <capacity title="STEADY">....</capacity>
  </model>
</battleEngine>

Did you catch that? <battleEngine> NESTS another <model>! From the WHAC app, which seeks to duplicate the cards that come in the box this makes sense. There's a card which represents the TEM, and there's another card which gives the statistics for each of the spawnable Servitors. Also, <battleEngine> as a tag is implicitly dealt with as <model>, same as <warcaster> was earlier.

Rendering in the app this looks just fine, because you render the "top" model being the big ol' TEM anyone reading the cards cares about and the Servitors are secondary and so don't occur as a primary listing.

Unfortunately this is nonsense from my point of view as a consumer trying to extract structured information. The TEM is a <model> in and of itself, as are the Servitors, and they come together in a package that PP calls PIP 36028.

A similar problem applies to units. It turns out that the Frenchman encodes units by using this implicit <model> behavior. So a unit will be named "Foo, Leader & Grunts", use the implicit <model> tag, just have <basestats>, some weapons and capacities and that'll be all. Which sorta works.

Unfortunately it falls apart in units with models that aren't all the same. I've wasted enough length on images and listings here, but go check out the encoding of the Black 13th if you care. The Black 13th is composed of three characters, being Lynch, Watts and Ryan each of whom is unique. They are encoded using the implicit <model> on their shared <unit> to describe Lynch, the leader, and the other two have their own nested <model>s.

Again from a rendering standpoint this makes sense, but really what I want is something like this:

<unit name="the Black 13th" id="...">
  <model name="Lynch">
    ...
  </model>

  <model name="Ryan">
    ...
  </model>

  <model name="Watts">
    ...
  </model>
</unit>

because that would make sense.

The even more degenerate case of this is unfortunately the common case of an infantry unit: "Foo, Leader & Grunts" where the leader will get the implicit <model> and there'll be a <model name="grunts"> or something in its body. This is really a problem because usually the officer has a different sprite than the grunts, and the grunts need to have an ID in order to have a sprite permanently associated with them. Guess what attribute they may not have.

And don't get me started on the spelling errors and qualification mistakes and soforth I've found in these files as a result of trying to automate parsing them.

The Devils in the Sprites

The sprites are their own little shitshow. There isn't a naming convention to all of the files as they were done by several artists over the course of years. Most of the time they're named with substrings or abbreviations for a model's name, so some regex voodoo is able to recover about a third of the model to file associations but for most of it pfft I got to spend last night hand matching data file model IDs to sprite names.

I'm down to about 250 entities which aren't associated with a sprite, and I built a three page webapp to speed up building these associations, so it could be worse but it's still painful.

Why do this? Well because the XML file that's in the root of the VASSAL module and describes all the sprite to model associations isn't actually valid XML, it's some VASSAL specific nonsense that I can't seem to parse.

So yeah. XML and other people's data can go burn.

The question now before me is whether I try to do awful things to my XML file parser so I can continue consuming Sebastien's data file format non-deal as it is in the hope that as he maintains it I can "just" import changes or whether I want to for my own sanity just refactor these files and walk away from any future work of Sebastien's in the interest of quality and sanity of parsed results.

End Of Rant

Hacking like it's 2288

I'm an a diehard Fallout franchise fan. While school currently precludes me from playing, I took some time to automate solving the hacking minigame which is used to unlock computer terminals and secured areas in-game.

When "hacking", a player is presented with a list of words one of which is the "password" and has five (or fewer) tries to try and find the password. Every time the user selects a candidate, a response of how many characters match between the answer and the candidate is given so that the user can refine their guess.

The strategy for playing this game is uninteresting at best, you choose an initial word either heuristically or randomly, and then based on the number of characters which that word shares with the answer you can refine the whole list of words by removing all words which do not share exactly that many characters with the word you tried. This simple guess and refine algorithm one need only iterate until only one answer remains.

While an obvious algorithm, it's a little tedious to do this by hand because there's no good way to keep track of all the candidates which have been removed from consideration. Also computing string overlaps is boring. Given an algorithmic problem, write an algorithmic solition! What do we need to solve this problem in Clojure?

Well first we need our word "intersection" scoring function. To do this, we'll take the frequencies of the left word, the frequencies of the right word. Since each character can only correspond to another single character, this means that given N as on the laft and M bs on the right, we have (min N M) common characters. We can simply compute this common character count for every character in the left string (characters occuring only in the right string won't factor in anyway) and take the sum over those counts.

(defn letters-in-common [left right]
  (let [fl (frequencies left)
        fr (frequencies right)]
    (->> (for [[k vl] fl
               :let   [vr (get fr k 0)]]
           (min vl vr))
         (apply +))))

(letters-in-common "foo" "foooo")
;; => 3
(letters-in-common "cat" "dog")
;; => 0
(letters-in-common "cat" "rat")
;; => 2

So now we need a guess making function which we can write in terms of our scoring function. What word do we want to choose as our guess? Well we want to guess the word which will give us the most information about all the other words, that is to say is most similar to as many other words as possible.

Once we've made a guess we can use the word similarity score we get back to refine our dictionary and contrain our search space.

This must be the optimal choice by the usual greedy choice argument, since any other word we could guess would tell us less about the other words, and in the case of a tie (two words with equal similarity to all other words) we can't know without guessing which one is the better choice so we can choose randomly.

(defn make-guess
  "Given a population of words, locates and returns a word with maximum
  \"matching potential\", that is the word which shares the most
  characters with every other word.

  If there is a tie, since the two words have equal potentials it
  doesn't matter which one we choose and so the choice is arbitrary.

  Makes use of scoped memoization for performance on lots of words,
  but realistically this should never be a factor."
  [words]
  (let [-score-fn (memoize
                   (fn [col]
                     (let [[l r] (seq col)]
                       (letters-in-common l r))))
        score-fn  (fn [word]
                    (let [s (sorted-set word)]
                      (->> (for [w words
                                 :when (not= w word)]
                             (-score-fn (conj s w)))
                           (apply +))))]
    (last (sort-by score-fn words))))

Now we need our dictionary pruning function. If a word does not have exactly n letters in common with the last guess where n is the reported similarity of the last guess with the answer, that word cannot be a solution.

We know that for multiple guesses, no word could be an answer which does not have the reported similarity with all previous guesses. Thus we don't accidentially exclude any possible answers by simply filtering the dictionary as such after each guess.

(defn trim-pop [words guess commonality]
  (for [w     words
        :when (= commonality (letters-in-common w guess))]
    w))

(trim-pop ["foo" "bar" "cat" "rat"] "cat" 2)
;; => ("rat")

So lets put these two to use! We'll wire up make-guess to letters-in-common which is our scoring function anyway and just let it run on a small dictionary and a goal word. Just to make the point that this algorithm does converge.

;; An example hacking loop. The guess and refine algorithm with a
;; sample input.
(let [words ["LOWER" "CREED" "JAMES" "CAGES" "CARES" "OFFER" "CAVES" "TIRED"]
      answer "LOWER"]
  (loop [words words]
    (when (= (count words) 1)
      (first words)
      (let [guess  (make-guess words)
            score  (letters-in-common guess answer)
            words' (trim-pop words guess score)]
        (println "Guessed" guess "reduced population to" words')
        (recur words')))))

Yay! So that totally works.

And because I don't think anyone wants to manually run all that state through a REPL while trying to play Fallout 4, here's a quick REPL wrapper to help with hacking.

(defn autohack [words]
  (if-not (= 1 (count words))
    (let [guess (make-guess words)
          _     (println "Try>" guess)
          score (read)]
      (recur (trim-pop words guess score)))
    (println "It's" (first words))))

Happy wandering!

^d

Shitposting Vol 1

Subtitle: That Time @Snowden Shot Back.

I win.