I wrote this article consecutively to the previous one - The other side of the table: Simon Peyton-Jones. After I finished reading Simon's reply, I thought that it would be interesting to write a bit about randomized algorithms. This post should be straightforward. Some later parts require some familiarity with probability and discrete mathematics, but the rest should make sense even if you skip the more math-intensive parts.Randomization has become a standard approach in algorithm design and it can be applied in a wide range of problems. Often it leads to more efficient, simpler and shorter solutions than their deterministic counterparts. As opposed to a deterministic algorithm, a randomized algorithm takes a source of random numbers and makes random choices during execution. Hence, the behavior of the algorithm can change even on a fixed input and is likely to be good on every input.
The first class of randomized algorithms are Las Vegas algorithms, which always give correct results, the only variation from one run to another being its running time. The running time is a random variable whose expectation is bounded, for example by a polynomial. The second class are Monte Carlo algorithms, whose running time is deterministic, but whose results are only correct with a probability of
One of the most widely used randomized algorithms is the randomized Quicksort, which I will talk about. I picked it because it is very practical, easily comprehendible and the implementation is trivial. It is an example of a Las Vegas algorithm. Also, many of you are probably already familiar with either the normal (deterministic) or the randomized version, or both.
Quicksort

Quicksort is a divide and conquer sorting algorithm developed by Tony Hoare, who published the first version in the Communications of the ACM, Volume 4, Issue 7, in July 1961. (Oh, just in case any Slashdotters happen to read this: No, the fact he is a senior researcher at Microsoft Research does not make him an evil person.)
The algorithm works by first partitioning (dividing/rearranging) an input array containing the elements to be sorted into 2 sub-arrays around an element called the pivot
Anyhow, we all know that code says more than a thousand words, so here we go. The following code is one of the simpler implementations of Quicksort, without any optimizations, as it appeared in Steven Skiena's The Algorithm Design Manual
void quicksort(int s[], int l, int h)
{
int p; /* index of partition */
if ((h - l) > 0) {
p = partition(s, l, h);
quicksort(s, l, p - 1);
quicksort(s, p + 1, h);
}
}
int partition(int s[], int l, int h)
{
int i;
int p; /* pivot element index */
int firsthigh; /* divider position for pivot element */
p = h;
firsthigh = l;
for (i = l; i < h; i++)
if(s[i] < s[p]) {
swap(&s[i], &s[firsthigh]);
firsthigh++;
}
swap(&s[p], &s[firsthigh]);
return(firsthigh);
}
void swap(int *a, int *b)
{
int x;
x = *a;
*a = *b;
*b = x;
}
{
int p; /* index of partition */
if ((h - l) > 0) {
p = partition(s, l, h);
quicksort(s, l, p - 1);
quicksort(s, p + 1, h);
}
}
int partition(int s[], int l, int h)
{
int i;
int p; /* pivot element index */
int firsthigh; /* divider position for pivot element */
p = h;
firsthigh = l;
for (i = l; i < h; i++)
if(s[i] < s[p]) {
swap(&s[i], &s[firsthigh]);
firsthigh++;
}
swap(&s[p], &s[firsthigh]);
return(firsthigh);
}
void swap(int *a, int *b)
{
int x;
x = *a;
*a = *b;
*b = x;
}
I will skip an elaborate analysis of the deterministic Quicksort, as we want to emphasize on the randomized version. Most of you are likely to know that Quicksort has an expected running time of
Randomized Quicksort
The above implementation automatically selects the last element in each sub-array as the pivot. We will now look at a technique called random sampling. So instead of always taking the last element in each sub-array, we will select a random element as the pivot during the partitioning. One possible implementation based on our previous code looks like this:
int partition(int s[], int l, int h)
{
int i;
int p; /* pivot element index */
int firsthigh; /* divider position for pivot element */
p = l + (random() % (h - l + 1);
swap(&s[p], &s[h]);
firsthigh = l;
for (i = l; i < h; i++)
if(s[i] < s[h]) {
swap(&s[i], &s[firsthigh]);
firsthigh++;
}
swap(&s[h], &s[firsthigh]);
return(firsthigh);
}
{
int i;
int p; /* pivot element index */
int firsthigh; /* divider position for pivot element */
p = l + (random() % (h - l + 1);
swap(&s[p], &s[h]);
firsthigh = l;
for (i = l; i < h; i++)
if(s[i] < s[h]) {
swap(&s[i], &s[firsthigh]);
firsthigh++;
}
swap(&s[h], &s[firsthigh]);
return(firsthigh);
}
Well, so what are the advantages of the randomized version?
- The running time is independent of input ordering.
- The algorithms does not makes any assumptions about the distribution of the elements in the input array. Already sorted arrays will work as well as random ones.
- No specific input that can produce the worst case behavior, which is determined only by the random number generator.
Analysis of randomized Quicksort
First of all, let's assume that no two elements in the input array are equal. We will be writing will be
the quantity we care about (the total number of comparisons) as a sum of simpler random variables, and then just analyze the simpler ones. Define a random variable
and therefore
Let's consider one of these
In other words, for a given element i, it is compared to element
The quantity
Summoning the evil
In 1999 Doug McIlroy from Dartmouth College published a paper called A Killer Adversary for Quicksort which describes a simple way to find the inputs that even force the randomized Quicksort into the worst case running time of Quicksort can be made to go quadratic by constructing input on the fly in response to the sequence of items compared. The technique is illustrated by a specific adversary for the standard C qsort function. The general method works against any implementation of quicksort–even a randomizing one–that satisfies certain very mild and realistic assumptions.Russ Cox wrote a post about this paper, too.
Conclusion and one more thing
This is just a very trivial example of randomized algorithms and many of you are likely to be already familiar with it. Though, if this post happens to attract some interest in this topic, I might write up some more posts about randomized algorithms. If you want to learn more about Quicksort and the analysis of both, the randomized and the deterministic version, there is a good video lecture available at MIT OpenCourseWare here.
I chose to post source code in C, as I think that it is one of the languages most of you know, that it is more practical than pseudocode and that at the same time it is rather easy to read. If you happen to not be happy with my decision, feel free to post an implementation in your favorite programming language as a comment.
Also, as this is my first rather deep post, if you happen to find an error in this post, please leave a comment and I will correct the mistake. Unfortunately, though, I won't be able to reward you with a hexadecimal dollar as Donald Knuth does.
Update: A few more links to resources related to this topic are posted over here. Also I will write a real follow-up on this topic when I find some time for it.
Labels: Algorithms, C, Programming


Post a Comment
Some HTML tags for basic formatting are working. Unfortunately Blogger won't allow 'pre' and 'code' tags. If you want to paste code, please do it plain. Comments are displayed in a fixed-width font.