Re: A qsort template

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A qsort template
Date: 2022-01-19 02:39:21
Message-ID: CAFBsxsEFGAJ9eBpQVb5a86BE93WER3497zn2OT5wbjm1HHcqgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> That said, I think what I'll do next is test the v3-0001 standalone
> patch with tuplesort specializations for more data types. I already
> have a decent test script that I can build on for this.

I've run a test with 10 million records using all types found in the
v3 patch "accelerate tuple sorting for common types", using a variety
of initial orderings, covering index build (btree only, no gist) and
queries (both single value and whole record). Attached is the test
script and a spreadsheet with the raw data as well as comparisons of
the min runtimes in seconds from 5 runs. This is using gcc 11.1 on
fairly recent Intel hardware.

Overall, this shows a good improvement for these types. One exception
is the "0/1" ordering, which is two values in random order. I'm
guessing it's because of the cardinality detector, but some runs have
apparent possible regressions. It's a bit high and sporadic to just
blow off as noise, but this case might not be telling us anything
useful.

Other notes:

- The inet type seems unnaturally fast in some places, meaning faster
than int or date. That's suspicous, but I haven't yet dug deeper into
that.

- With the patch, the VM binary size increases by ~9kB.

I have some hunches on the "future research" comments:

XXX Can we avoid repeating the null-handling logic?

More templating? ;-)

XXX Is it worth specializing for reverse sort?

I'll run a limited test on DESC to see if anything stands out, but I
wonder if the use case is not common -- I seem to remember seeing DESC
less often on the first sort key column.

XXX Is it worth specializing for nulls first, nulls last, not null?

Editorializing the null position in queries is not very common in my
experience. Not null is interesting since it'd be trivial to pass
constant false to the same Apply[XYZ]SortComparator() and let the
compiler remove all those branches for us. On the other hand, those
branches would be otherwise predicted well, so it might make little or
no difference.

XXX Should we have separate cases for "result is authoritative", "need
XXX tiebreaker for atts 1..n (= abbrev case)", "need tie breaker for
XXX atts 2..n"?

The first one seems to be the only case where the SortTuple could be
smaller, since the tuple pointer is null. That sounds like a good
avenue to explore. Less memory usage is always good.

Not sure what you mean by the third case -- there are 2+ sort keys,
but the first is authoritative from the datum, so the full comparison
can skip the first key?

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
qsort-specialize-types-jcn1.ods application/vnd.oasis.opendocument.spreadsheet 36.0 KB
test-tuplesort-20220113.ods application/vnd.oasis.opendocument.spreadsheet 20.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-19 02:50:07 Re: Adding CI to our tree
Previous Message Tom Lane 2022-01-19 02:29:16 Re: pgsql: Modify pg_basebackup to use a new COPY subprotocol for base back