Quick Links

Re: tuple radix sort

From:	John Naylor <johncnaylorls(at)gmail(dot)com>
To:	Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc:	Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject:	Re: tuple radix sort
Date:	2026-01-21 06:40:11
Message-ID:	CANWCAZYur4VDbx9zkpFM3mM4-Hj4_Vt1t=Z4TPhK7+3=T6BMOA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Attached is v6, which seems pretty close to committable. The temporary
GUC is gone, as are all the integer-like qsort specializations. There
are a few cosmetic rearrangements, but the algorithm is pretty much
the same as v5. The only visible behavior difference should be the
addition of a presorted check just like qsort has.

* The only things I have doubts about are the user-visible messages
mentioning "qsort". For trace_sort, it seemed logical to re-use the
term "internal sort" from other such messages, so I've done that for
v6. EXPLAIN ANALYZE is a bit harder: Do we want to even change
anything here? When things like this come up, there is the question of
tools that parse the output. It doesn't seem particularly important to
try to display exact info here, especially since radix sort must
divert to qsort for multiple keys etc.

Other updates:

1) I wanted a better test for small input, in case the threshold
needed adjusting. I lost the test during a fumbled rebase, but it's
easy to describe: instead of calling tuplesort_sort_memtuples(), call
both radix sort and qsort on copies of the input, with RDTSC calls
around each. I've attached the results, which are average ticks per
element sorted, averaging a million runs. The threshold seems to be
fine at 40, even down to a cardinality of 6. It's important to be as
low as we can get away with, since "qsort" here is qsort_tuple, not
the former specialized variants for integer comparators.

2) We don't need to try harder in the NULL partitioning step

For example, the case below with all NULL in the first key. The
underlying behavioral difference is, with qsort the comparator tries
the NULL first key (which ties every time), and then tries the second
key. With radix sort, the first key NULLs are partitioned first, doing
no useful work since they're all NULL, then qsort is called with the
tiebreak comparator on the whole input.

By this test, there is only noise difference with low cardinality
(before I removed the GUC):

drop table if exists test;
create unlogged table test (a bigint, b bigint not null);
insert into test select
NULL,
(1_000_000_000 * random())::bigint % 8
from generate_series(1,1_000_000,1) i;
vacuum freeze test;
select pg_prewarm('test');

set work_mem = '500MB';
\timing

set wip_radix_sort = 'off'; select * from test order by a, b offset
1_000_000_000;
280ms

set wip_radix_sort = 'on'; select * from test order by a, b offset
1_000_000_000;
277ms

Speaking of the NULL partitioning step, I've tried to add some
comments that make sense, but the pictures in the linked blog are more
helpful than any words I can come up with. It's easier to understand
than to describe.

--
John Naylor
Amazon Web Services

Attachment	Content-Type	Size
20260120-radix-sort-threshold-test.txt	text/plain	3.2 KB
v6-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch	application/x-patch	2.7 KB
v6-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch	application/x-patch	26.4 KB

In response to

Re: tuple radix sort at 2025-12-08 06:31:07 from Chao Li

Responses

Re: tuple radix sort at 2026-01-21 08:12:42 from zengman

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	VASUKI M	2026-01-21 06:43:55	Re: Optional skipping of unchanged relations during ANALYZE?
Previous Message	David Rowley	2026-01-21 06:36:03	Re: Optional skipping of unchanged relations during ANALYZE?