Sorting for $500
04 Feb 2013It was said oft and loud in my freshman year algorithms class that the algorithms and datastructures we were covering would be hot topics for interviews and now I know why. While they may appear simple in class they prove easy to forget and relatively hard to reinvent on the spur of the moment. I thought that my personal demon of choice was depth first traversals but I was wrong. I grocked depth first work when we did it in class: but quicksort (qsort) and mergesort (msort) I never quite got straight.
The confusing thing to me about qsort and msort is that they are both
fundamentally recursive sorting solutions which didn’t really differ
that much in my head when I passed algos and since I haven’t sat down
to implement an on-disk b-tree or other heavyweight sorting problem I
just haven’t stayed up to speed on them. As you can probably guess I
had another interview today and I mixed the two up. I was quoted
claiming quite falsely that quicksort had a merge step and was only
headed off by a merciful interviewer who wanted me to shut up and
implement the (merge)
operation which I did manage to pull off
correctly.
However, this is about remembering merge sort vs quicksort so I’ll get
to it. Also fair warning this is the debut of gists instead of
<&code>
elements for code insertion into this blog. We’ll see how
they shake out, should be OK what with my relatively wide main column.
Quicksort
The quicksort concept is, given a sequence S, choose a “pivot”
value P and compute the subsequences L and R such that all values in L
are strictly less than the value of P and those in R are greater than
or equal to (allowing for duplicate values). The recursion case is
that the quicksort of S is the sequence (concat (quicksort L) (list
P) (quicksort R))
, with cases for the sequences L and R being empty
or the case (quicksort []) -> []
provided for. This amounts to
recursively decomposing the sequence S into a sequence of strictly
increasing values which are then composed as the return part of the
recursion into the resulting ordered sequence.
Because of the nature of Clojure’s (remove)
operation, the
(%remove-first)
function is called for to prevent the key from being
incorrectly included in either the left or right subsequences unless
there are duplicates of the pivot value which a simple (remove)
would kill.
Mergesort
Mergesort on the other hand is based on the similar concept that you can “merge” two ordered sequences of lengths M and N in time M+N, and that computing the subsequences L and R of S such that L and R are the left and right halves of S is also trivial. This gives quicksort to be recursively the (sorted) merge of the sequences L and R.
Owning it
Okay, well that’s not bad. Merge sort centers around the “merge” operation whereas quicksort centers around the pivot hence the pivot choice problem and the “center element” heuristic.
Edits
There are updates to the gists embedded above because I realized more textually efficient expressions for both qsort and msort which try to eliminate special cases where possible.