Joseph K. Myers

Tuesday, November 30, 2004

Median Choice

We describe a method in general terms which allows opportunistic selection of the median of data, in a way useful for sorting. We do not discuss particulars of implementation; these may be varied, and our precise algorithm may be perused within the source code.

These are the steps of our algorithm:

1) select two elements and compare them.

2) select a third element and use it to establish bounds, where the highest and lowest of the three elements set the upper and lower bound

3) choose a fourth element. If it falls between the bounds, then proceed to set new bounds by choosing the second and third ranked elements.

4) proceed in this fashion, where the bounds are adjacent for even number of elements, and the bounds are separated by the median for an odd number of elements.

Cease when any element falls out of bounds.

Use this approximate median choice as a pivot for the quicksort algorithm[1].

We now display charts comparing our method to existing implementations of quicksort within the Unix C library (see qsort[3].) and to further optimized versions of those[2].

The input size in count of elements is given on the x-axis. The time in microseconds is given for sorting on the y-axis.

We will distinguish between three types of data for sorting: data where each element is unique within the set of input is said to be dense; data with pre-existing order is said to be non-random; data which is sorted is said to be ordered.

To represent random data, we must use two categories: random dense data, and random sparse data. We use a set of 32-bit values to represent the first, and a set of 8-bit values for the second.

We call this test uchar-input for 8-bit values.

uchar input

We call the test of 32-bit random input uint-input.

uint input

Similarly, we define a test uchar-clusters/uint-clusters to define 8-bit and 32-bit non-random data.

uchar

uchar clusters

uint

uint clusters

To represent ordered data, we use uchar-input-sort/uint-input-sort.

uchar

uchar input-sort

uint

uint input-sort

We have tried to design our program with three goals in mind:

1) Sort common datasets quickly.

2) Sort random input quickly.

3) Sort ordered input quickly.

The test results seem to show our goals have been accomplished, although there is more work to be done.

The theory of sorting and the idea of opportunistic convergence to reduce the sorting task complexity, and the principle of pre-existing order in data are sstudied more in detail in other literature which is not publicly released.

Appendices

1. data-files.

References

1. Myers JK. An Optimal Quicksort Partitioning Algorithm for PPC and x86 Architectures. 2004. <http://www.myersdaily.org/joseph/unix/trisort.txt>

2. Myers JK. Sort. <http://www.myersdaily.org/joseph/unix/sort.html>

3. BSD. qsort-freebsd.c. <http://www.funet.fi/pub/files/qsort-freebsd.c>

4. Myers JK. Sorting studies. <http://www.myersdaily.org/joseph/unix/sort/>