Re: Insertion Sort Improvements

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insertion Sort Improvements
Date: 2022-08-26 11:26:27
Message-ID: CAFBsxsHPrvpqfNpHHC6R9fTe_us=AJ4_REWdUO9NuHcwPFF1tA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 25, 2022 at 5:55 PM Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com> wrote:
>
> Hello,
>
> Inspired by the recent discussions[1][2] around sort improvements, I took a look around the code and noticed the use of a somewhat naive version of insertion sort within the broader quicksort code.
>
> The current implementation (see sort_template.h) is practically the textbook version of insertion sort:

Right. I think it's worth looking into. When I tested dual-pivot
quicksort, a threshold of 18 seemed best for my inputs, so making
insertion sort more efficient could tilt the balance more in favor of
dual-pivot. (It already seems slightly ahead, but as I mentioned in
the thread you linked, removing branches for null handling would make
it more compelling).

> DO_ASSIGN and DO_COPY macros would have to be declared analogue to DO_SWAP via the template.

I don't think you need these macros, since this optimization is only
convenient if you know the type at compile time. See the attached,
which I had laying around when I was looking at PDQuicksort. I believe
it's similar to what you have in mind. (Ignore the part about
"unguarded", it's irrelevant out of context). Notice that it avoids
unnecessary copies, but has two calls to DO_COMPARE, which is *not*
small for tuples.

> Anyways, there might be some low hanging fruit here. If it turns out to be significantly faster one might even consider increasing the insertion sort threshold from < 7 to < 10 sort elements. But that is a whole other discussion for another day.

I think it belongs around 10 now anyway. I also don't think you'll see
much benefit with your proposal at a threshold of 7 -- I suspect it'd
be more enlightening to test a range of thresholds with and without
the patch, to see how the inflection point shifts. That worked pretty
well when testing dual-pivot.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-0008-Use-unguarded-insertion-sort-where-possible.patch text/x-patch 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-08-26 11:26:37 Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~
Previous Message Dilip Kumar 2022-08-26 11:00:54 Re: making relfilenodes 56 bits