Sorting for $500

04 Feb 2013

It was said oft and loud in my freshman year algorithms class that the algorithms and datastructures we were covering would be hot topics for interviews and now I know why. While they may appear simple in class they prove easy to forget and relatively hard to reinvent on the spur of the moment. I thought that my personal demon of choice was depth first traversals but I was wrong. I grocked depth first work when we did it in class: but quicksort (qsort) and mergesort (msort) I never quite got straight.

The confusing thing to me about qsort and msort is that they are both fundamentally recursive sorting solutions which didn’t really differ that much in my head when I passed algos and since I haven’t sat down to implement an on-disk b-tree or other heavyweight sorting problem I just haven’t stayed up to speed on them. As you can probably guess I had another interview today and I mixed the two up. I was quoted claiming quite falsely that quicksort had a merge step and was only headed off by a merciful interviewer who wanted me to shut up and implement the (merge) operation which I did manage to pull off correctly.

However, this is about remembering merge sort vs quicksort so I’ll get to it. Also fair warning this is the debut of gists instead of <&code> elements for code insertion into this blog. We’ll see how they shake out, should be OK what with my relatively wide main column.

Quicksort

The quicksort concept is, given a sequence S, choose a “pivot” value P and compute the subsequences L and R such that all values in L are strictly less than the value of P and those in R are greater than or equal to (allowing for duplicate values). The recursion case is that the quicksort of S is the sequence (concat (quicksort L) (list P) (quicksort R)), with cases for the sequences L and R being empty or the case (quicksort []) -> [] provided for. This amounts to recursively decomposing the sequence S into a sequence of strictly increasing values which are then composed as the return part of the recursion into the resulting ordered sequence.

Because of the nature of Clojure’s (remove) operation, the (%remove-first) function is called for to prevent the key from being incorrectly included in either the left or right subsequences unless there are duplicates of the pivot value which a simple (remove) would kill.

Mergesort

Mergesort on the other hand is based on the similar concept that you can “merge” two ordered sequences of lengths M and N in time M+N, and that computing the subsequences L and R of S such that L and R are the left and right halves of S is also trivial. This gives quicksort to be recursively the (sorted) merge of the sequences L and R.

Owning it

Okay, well that’s not bad. Merge sort centers around the “merge” operation whereas quicksort centers around the pivot hence the pivot choice problem and the “center element” heuristic.

Edits

There are updates to the gists embedded above because I realized more textually efficient expressions for both qsort and msort which try to eliminate special cases where possible.