Re: Insertion Sort Improvements

From: Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insertion Sort Improvements
Date: 2023-05-24 05:59:45
Message-ID: 96811ebc460cd665205f@zeyos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> That's worth trying out. It might also then be worth trying to push both unordered values -- the big one up / the small one down. I've seen other implementations do that, but don't remember where, or what it's called.

It is important that we do not do 2 compares two avoid one copy (assignment to temporary) as you did in your patch earlier in this thread, cause compares are usually pretty costly (also two compares are then inlined, bloating the binary).
Assigning a sort tuple to a temporary translates to pretty simple assembly code, so my suggested modification should outperform. It cuts down the cost of the inner loop by ca. 40% comparing the assembly. And it avoids having to re-read memory on each comparison, as the temporary can be held in registers.

> "Unbounded" means no bounds check on the array. That's not possible in its current form, so I think you misunderstood something.

Sorry for the confusion. I didn't mean unbounded in the "array bound checking" sense, but in the "unrestricted number of loops" sense.

> I only remember implementations tracking loop iterations, not swaps. You'd need evidence that this is better.

Well, the idea was to include the presorted check somehow. Stopping after a certain number of iterations is surely more safe than stopping after a number of swaps, but we would then implicitly also stop our presort check. We could change that though: Count loop iterations and on bailout continue with a pure presort check, but from the last position of the insertion sort -- not all over again -- by comparing against the maximum value recorded during the insertion sort. Thoughts?

> An important part not mentioned yet: This might only be worth doing if the previous recursion level performed no swaps during partitioning and the current pivot candidates are ordered.

Agreed.

> It's currently 7, but should really be something like 10. A script that repeats tests for, say, 7 through 18 should show a concave-up shape in the times. The bottom of the bowl should shift to higher values, and that minimum is what should be compared.

Yeah, as alluded to before, it should be closer to 10 nowadays.
In any case it should be changed at least from 7 to 8, cause then we could at least discard the additional check for n > 7 in the quicksort code path (see /src/include/lib/sort_template.h#L322). Currently we check n < 7 and a few lines down we check for n > 7, if we check n < 8 for insertion sort then the second check becomes obsolete.

Benjamin Coutu
http://www.zeyos.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-05-24 06:18:13 Re: Large files for relations
Previous Message Zhijie Hou (Fujitsu) 2023-05-24 05:53:51 RE: walsender performance regression due to logical decoding on standby changes