Re: some aspects of our qsort might not be ideal

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: some aspects of our qsort might not be ideal
Date: 2022-06-23 10:12:58
Message-ID: CAFBsxsHWituORSTKvvAL7bmDmQ-qy580Mnq=7BFrPY7-0qg_NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> What about dual-pivot quicksort, which is used in Java 7+? That is the
> defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself
> collaborated with its author, and provided benchmarking input. The
> underlying philosophy is essentially the same as the original -- it
> is supposed to be an "industrial strength" quicksort, with various
> adversarial cases considered directly.

Here is a *rough* first pass at dual-pivot quicksort. I haven't
changed the regression tests to adjust for qsort being an unstable
sort, and there is some dead code. I looked to a couple JDKs for
examples of design decisions as a first approximation. It includes a
module with a few debugging printouts, run like so: select
debug_qsort_int(array[7,6,5,4,3,2,1]);

Pivot selection: A common way is to pick five candidates (here: mid,
+/- 1/7, +/- 2/7), sort them in place, then pick the 2nd and 4th of
them, or "tertile of five". If they are all evenly spaced, we can do
insertion sort.

Fall back to single-pivot: If any of the pivot candidates are equal,
JDK assumes there could be a large number of duplicates and falls back
to single-pivot, using the median of the five. I've done the same
here. Despite having two code paths in all of our template instances,
the VM binary is only about 5kb bigger, since MED3 is no longer built.

Performance testing: Not started yet. I'm thinking an apples-to-apples
comparison is not insightful enough, since the current insertion sort
threshold 7 is already a bit constrained for single-pivot, and would
be even more so for dual pivot, especially since 5 of the elements are
pre-sorted to choose the pivots. My plan is to retest the threshold on
HEAD using my latest tests, then use that as a starting point to test
thresholds with dual-pivot.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Create-internal-workhorse-for-ST_SORT.patch text/x-patch 2.5 KB
v1-0002-Implement-dual-pivot-quicksort.patch text/x-patch 10.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-06-23 11:26:04 Re: Custom tuplesorts for extensions
Previous Message Julien Rouhaud 2022-06-23 10:11:55 Re: NAMEDATALEN increase because of non-latin languages