| From: | John Naylor <johncnaylorls(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Subject: | Re: tuple radix sort |
| Date: | 2026-01-21 06:40:11 |
| Message-ID: | CANWCAZYur4VDbx9zkpFM3mM4-Hj4_Vt1t=Z4TPhK7+3=T6BMOA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Attached is v6, which seems pretty close to committable. The temporary
GUC is gone, as are all the integer-like qsort specializations. There
are a few cosmetic rearrangements, but the algorithm is pretty much
the same as v5. The only visible behavior difference should be the
addition of a presorted check just like qsort has.
* The only things I have doubts about are the user-visible messages
mentioning "qsort". For trace_sort, it seemed logical to re-use the
term "internal sort" from other such messages, so I've done that for
v6. EXPLAIN ANALYZE is a bit harder: Do we want to even change
anything here? When things like this come up, there is the question of
tools that parse the output. It doesn't seem particularly important to
try to display exact info here, especially since radix sort must
divert to qsort for multiple keys etc.
Other updates:
1) I wanted a better test for small input, in case the threshold
needed adjusting. I lost the test during a fumbled rebase, but it's
easy to describe: instead of calling tuplesort_sort_memtuples(), call
both radix sort and qsort on copies of the input, with RDTSC calls
around each. I've attached the results, which are average ticks per
element sorted, averaging a million runs. The threshold seems to be
fine at 40, even down to a cardinality of 6. It's important to be as
low as we can get away with, since "qsort" here is qsort_tuple, not
the former specialized variants for integer comparators.
2) We don't need to try harder in the NULL partitioning step
For example, the case below with all NULL in the first key. The
underlying behavioral difference is, with qsort the comparator tries
the NULL first key (which ties every time), and then tries the second
key. With radix sort, the first key NULLs are partitioned first, doing
no useful work since they're all NULL, then qsort is called with the
tiebreak comparator on the whole input.
By this test, there is only noise difference with low cardinality
(before I removed the GUC):
drop table if exists test;
create unlogged table test (a bigint, b bigint not null);
insert into test select
NULL,
(1_000_000_000 * random())::bigint % 8
from generate_series(1,1_000_000,1) i;
vacuum freeze test;
select pg_prewarm('test');
set work_mem = '500MB';
\timing
set wip_radix_sort = 'off'; select * from test order by a, b offset
1_000_000_000;
280ms
set wip_radix_sort = 'on'; select * from test order by a, b offset
1_000_000_000;
277ms
Speaking of the NULL partitioning step, I've tried to add some
comments that make sense, but the pictures in the linked blog are more
helpful than any words I can come up with. It's easier to understand
than to describe.
--
John Naylor
Amazon Web Services
| Attachment | Content-Type | Size |
|---|---|---|
| 20260120-radix-sort-threshold-test.txt | text/plain | 3.2 KB |
| v6-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch | application/x-patch | 2.7 KB |
| v6-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch | application/x-patch | 26.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | VASUKI M | 2026-01-21 06:43:55 | Re: Optional skipping of unchanged relations during ANALYZE? |
| Previous Message | David Rowley | 2026-01-21 06:36:03 | Re: Optional skipping of unchanged relations during ANALYZE? |