Re: A qsort template

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A qsort template
Date: 2021-03-03 04:17:13
Message-ID: CA+hUKGJhOtjQH+wjCodtJzHAFCAPYyt6Qm9E1mUoJph42qo1hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 3, 2021 at 10:25 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > On 18 Feb 2021, at 04:09, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > In another thread[1], I proposed $SUBJECT, but then we found a better
> > solution to that thread's specific problem. The general idea is still
> > good though: it's possible to (1) replace several existing copies of
> > our qsort algorithm with one, and (2) make new specialised versions a
> > bit more easily than the existing Perl generator allows. So, I'm back
> > with a rebased stack of patches. I'll leave specific cases for new
> > worthwhile specialisations for separate proposals; I've heard about
> > several.
>
> Just to play around with this while reviewing I made a qsort_strcmp, like in
> the attached, and tested it using a ~9M word [0] randomly shuffled wordlist.
> While being too small input to make any meaningful difference in runtime (it
> shaved a hair off but it might well be within the error margin) there was no
> regression either. More importantly, it was really simple and quick to make a
> tailored qsort which is the intention with the patch. While still being a bit
> of magic, moving from the Perl generator makes this slightly less magic IMO so
> +1 on this approach.

Thanks for testing and reviewing!

> A tiny nitpick on the patch itself:
>
> + * - ST_COMPARE(a, b) - a simple comparison expression
> + * - ST_COMPARE(a, b, arg) - variant that takes an extra argument
> Indentation.

Fixed. Also ran pgindent.

> All tests pass and the documentation in the the sort_template.h is enough to go
> on, but I would prefer to see a comment in port/qsort.c referring back to
> sort_template.h for documentation.

I tried adding a comment along the lines "see lib/sort_template.h for
details", but it felt pretty redundant, when the file contains very
little other than #include "lib/sort_template.h" which should already
tell you to go and look there to find out what this is about...

I went ahead and pushed these.

I am sure there are plenty of opportunities to experiment with this
code. Here are some I recall Peter Geoghegan mentioning:

1. If you know that elements are unique, you could remove some
branches that deal with equal elements (see "r == 0").
2. Perhaps you might want to be able to disable the "presorted" check
in some cases?
3. The parameters 7, 7 and 40 were probably tuned for an ancient Vax
or similar[1]. We see higher insertion sort thesholds such as 27 in
more recent sort algorithms[2] used in eg the JVM. You could perhaps
speculate that the right answer depends in part on the element size; I
dunno, but if so, here we have that at compile time while traditional
qsort() does not.

As for which cases are actually worth specialising, I've attached the
example that Andres mentioned earlier; it seems like a reasonable
candidate to go ahead and commit too, but I realised that I'd
forgotten to attach it earlier.

It's possible that the existing support sorting tuples could be
further specialised for common sort key data types; I haven't tried
that.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf
[2] https://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf

Attachment Content-Type Size
0001-Specialize-checkpointer-sort-functions.patch text/x-patch 4.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-03 04:39:10 Re: Libpq support to connect to standby server as priority
Previous Message Greg Nancarrow 2021-03-03 04:05:45 Re: Libpq support to connect to standby server as priority