Re: qsort, once again

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Jerry Sievers" <jerry(at)jerrysievers(dot)com>
Subject: Re: qsort, once again
Date: 2006-03-21 18:52:17
Message-ID: 18732.1142967137@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Well, my point was that it is a snap to implement and test.

Well, having done this, I have to eat my words: it does seem to be a
pretty good idea.

The following test numbers are using Bentley & McIlroy's test framework,
but modified to test only the case N=10000 rather than the four smaller
N values they originally used. I did that because it exposes quadratic
behavior more obviously, and the variance in N made it harder to compare
comparison ratios for different cases. I also added a "NEARSORT" test
method, which sorts the input distribution and then exchanges two
elements chosen at random. I did that because I was concerned that
nearly sorted input would be the worst case for the presorted-input
check, as it would waste the most cycles before failing on such input.

With our existing qsort code, the results look like

distribution SAWTOOTH: max cratio 94.17, min 0.08, average 1.56 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.51 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.23, average 1.01 over 105 tests
distribution PLATEAU: max cratio 94.17, min 0.08, average 2.12 over 105 tests
distribution SHUFFLE: max cratio 94.17, min 0.23, average 1.92 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.69 over 75 tests
method FREVERSE: max cratio 94.17, min 0.08, average 5.71 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.82, min 0.08, average 0.31 over 75 tests
method NEARSORT: max cratio 0.82, min 0.08, average 0.36 over 75 tests
method DITHER: max cratio 5.52, min 0.18, average 0.77 over 75 tests
Overall: average cratio 1.42 over 525 tests

("cratio" is the ratio of the actual number of comparison function calls
to the theoretical expectation, N log2(N))

That's pretty awful: there are several test cases that make it use
nearly 100 times the expected number of comparisons.

Removing the swap_cnt test to bring it close to B&M's original
recommendations, we get

distribution SAWTOOTH: max cratio 3.85, min 0.08, average 0.70 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.52 over 105 tests
distribution STAGGER: max cratio 6.08, min 0.58, average 1.12 over 105 tests
distribution PLATEAU: max cratio 3.70, min 0.08, average 0.34 over 105 tests
distribution SHUFFLE: max cratio 3.86, min 0.86, average 1.24 over 105 tests
method COPY: max cratio 6.08, min 0.08, average 0.76 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.75 over 75 tests
method FREVERSE: max cratio 4.56, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.86, min 0.08, average 1.41 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.78 over 525 tests

which is a whole lot better as to both average and worst cases.

I then added some code to check for presorted input (just after the
n<7 insertion sort code):

#ifdef CHECK_SORTED
presorted = 1;
for (pm = (char *) a + es; pm < (char *) a + n * es; pm += es)
{
if (cmp(pm - es, pm) > 0)
{
presorted = 0;
break;
}
}
if (presorted)
return;
#endif

This gives

distribution SAWTOOTH: max cratio 3.88, min 0.08, average 0.62 over 105 tests
distribution RAND: max cratio 1.06, min 0.08, average 0.46 over 105 tests
distribution STAGGER: max cratio 6.15, min 0.08, average 0.98 over 105 tests
distribution PLATEAU: max cratio 3.79, min 0.08, average 0.31 over 105 tests
distribution SHUFFLE: max cratio 3.91, min 0.08, average 1.09 over 105 tests
method COPY: max cratio 6.15, min 0.08, average 0.72 over 75 tests
method REVERSE: max cratio 5.34, min 0.08, average 0.76 over 75 tests
method FREVERSE: max cratio 4.58, min 0.08, average 0.73 over 75 tests
method BREVERSE: max cratio 3.91, min 0.08, average 1.44 over 75 tests
method SORT: max cratio 0.08, min 0.08, average 0.08 over 75 tests
method NEARSORT: max cratio 0.89, min 0.08, average 0.39 over 75 tests
method DITHER: max cratio 3.73, min 0.18, average 0.72 over 75 tests
Overall: average cratio 0.69 over 525 tests

So the worst case seems only very marginally worse, and there is a
definite improvement in the average case, even for inputs that aren't
entirely sorted. Importantly, the "near sorted" case that I thought
might send it into quadratic behavior doesn't seem to do that.

So, unless anyone wants to do further testing, I'll go ahead and commit
these changes.

regards, tom lane

PS: Just as a comparison point, here are the results when testing HPUX's
library qsort:

distribution SAWTOOTH: max cratio 7.00, min 0.08, average 0.76 over 105 tests
distribution RAND: max cratio 1.11, min 0.08, average 0.53 over 105 tests
distribution STAGGER: max cratio 7.05, min 0.58, average 1.24 over 105 tests
distribution PLATEAU: max cratio 7.00, min 0.08, average 0.43 over 105 tests
distribution SHUFFLE: max cratio 7.00, min 0.86, average 1.54 over 105 tests
method COPY: max cratio 6.70, min 0.08, average 0.79 over 75 tests
method REVERSE: max cratio 7.05, min 0.08, average 0.78 over 75 tests
method FREVERSE: max cratio 7.00, min 0.08, average 0.77 over 75 tests
method BREVERSE: max cratio 7.00, min 0.08, average 2.11 over 75 tests
method SORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method NEARSORT: max cratio 0.86, min 0.08, average 0.56 over 75 tests
method DITHER: max cratio 4.06, min 0.16, average 0.74 over 75 tests
Overall: average cratio 0.90 over 525 tests

and here are the results using glibc's qsort, which of course isn't
quicksort at all but some kind of merge sort:

distribution SAWTOOTH: max cratio 0.90, min 0.49, average 0.65 over 105 tests
distribution RAND: max cratio 0.91, min 0.49, average 0.76 over 105 tests
distribution STAGGER: max cratio 0.92, min 0.49, average 0.70 over 105 tests
distribution PLATEAU: max cratio 0.84, min 0.49, average 0.54 over 105 tests
distribution SHUFFLE: max cratio 0.64, min 0.49, average 0.52 over 105 tests
method COPY: max cratio 0.92, min 0.49, average 0.66 over 75 tests
method REVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method FREVERSE: max cratio 0.92, min 0.49, average 0.67 over 75 tests
method BREVERSE: max cratio 0.92, min 0.49, average 0.68 over 75 tests
method SORT: max cratio 0.49, min 0.49, average 0.49 over 75 tests
method NEARSORT: max cratio 0.55, min 0.49, average 0.51 over 75 tests
method DITHER: max cratio 0.92, min 0.50, average 0.74 over 75 tests
Overall: average cratio 0.63 over 525 tests

PPS: final version of test framework attached for the archives.

Attachment Content-Type Size
sorttester.c application/octet-stream 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Hallgren 2006-03-21 18:52:47 Re: Question about MemoryContexts and functions that returns
Previous Message Jim C. Nasby 2006-03-21 18:05:54 Re: [GENERAL] A real currency type