Re: A qsort template

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A qsort template
Date: 2021-07-30 00:34:01
Message-ID: CAFBsxsGzW5zTEGP6HPPh9vxkxJd_6kYN1-mRi6bKy+nM+oNpdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 4, 2021 at 12:27 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Since you are experimenting with tuplesort and likely thinking similar
> thoughts, here's a patch I've been using to explore that area. I've
> seen it get, for example, ~1.18x speedup for simple index builds in
> favourable winds (YMMV, early hacking results only). Currently, it
> kicks in when the leading column is of type int4, int8, timestamp,
> timestamptz, date or text + friends (when abbreviatable, currently
> that means "C" and ICU collations only), while increasing the
> executable by only 8.5kB (Clang, amd64, -O2, no debug).
>
> These types are handled with just three specialisations. Their custom
> "fast" comparators all boiled down to comparisons of datum bits,
> varying only in signedness and width, so I tried throwing them away
> and using 3 new common routines. Then I extended
> tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to
> recognise qualifying users of those and select 3 corresponding sort
> specialisations.

I got around to getting a benchmark together to serve as a starting point.
I based it off something I got from the archives, but don't remember where
(I seem to remember Tomas Vondra wrote the original, but not sure). To
start I just used types that were there already -- int, text, numeric. The
latter two won't be helped by this patch, but I wanted to keep something
like that so we can see what kind of noise variation there is. I'll
probably cut text out in the future and just keep numeric for that purpose.

I've attached both the script and a crude spreadsheet. I'll try to figure
out something nicer for future tests, and maybe some graphs. The
"comparison" sheet has the results side by side (min of five). There are 6
distributions of values:
- random
- sorted
- "almost sorted"
- reversed
- organ pipe (first half ascending, second half descending)
- rotated (sorted but then put the smallest at the end)
- random 0s/1s

I included both "select a" and "select *" to make sure we have the recent
datum sort optimization represented. The results look pretty good for ints
-- about the same speed up master gets going from tuple sorts to datum
sorts, and those got faster in turn also.

Next I think I'll run microbenchmarks on int64s with the test harness you
attached earlier, and experiment with the qsort parameters a bit.

I'm also attaching your tuplesort patch so others can see what exactly I'm
comparing.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
sort-bench-jcn1.sh text/x-sh 7.5 KB
test-acc-tuplesort-20210729.ods application/vnd.oasis.opendocument.spreadsheet 17.9 KB
0001-WIP-Accelerate-tuple-sorting-for-common-types.patch application/octet-stream 15.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-30 00:48:47 Re: Replace l337sp34k in comments.
Previous Message Bossart, Nathan 2021-07-29 23:37:56 Re: Slightly improve initdb --sync-only option's help message