| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Subject: | Re: tuple radix sort |
| Date: | 2025-10-29 08:25:03 |
| Message-ID: | DDA507E1-56DB-4D76-8B81-41660ED932B9@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Oct 29, 2025, at 14:28, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> I suspect the challenge
> will be multikey sorts when the first key has low cardinality.
As you predicted, when the first key has very low cardinality, radix is a little bit slower. I built a test that proves that:
```
evantest=# drop table if exists test_multi;
evantest=# create unlogged table test_multi (category int, name text);
— first column has only 4 distinct values
evantest=# insert into test_multi select (random() * 4)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000);
evantest=# vacuum freeze test_multi;
evantest=# select count(*) from test_multi;
evantest=# set work_mem = '64MB’;
evantest-# \timing on
Timing is on.
evantest=# set wip_radix_sort = 'off';
Time: 0.403 ms
evantest=# \o /dev/null
evantest=# select * from test_multi order by category, name;
Time: 5607.336 ms (00:05.607)
evantest=# select * from test_multi order by category, name;
Time: 5703.555 ms (00:05.704)
evantest=# select * from test_multi order by category, name;
Time: 5692.644 ms (00:05.693)
evantest=# set wip_radix_sort = 'on';
Time: 0.859 ms
evantest=# select * from test_multi order by category, name;
Time: 5822.979 ms (00:05.823)
evantest=# select * from test_multi order by category, name;
Time: 5881.256 ms (00:05.881)
evantest=# select * from test_multi order by category, name;
Time: 5976.351 ms (00:05.976)
```
Roughly 5% slower for this corner case.
However, when I recreate the test table with high cardinality first column, wip_radix_sort seems still slower:
```
evantest=# \o
evantest=# drop table if exists test_multi;
DROP TABLE
evantest=# create unlogged table test_multi (category int, name text);
CREATE TABLE
evantest=# insert into test_multi
evantest-# select (random() * 1000000)::int as category, md5(random()::text) || md5(random()::text) as name from generate_series(1, 1000000);
INSERT 0 1000000
evantest=# vacuum freeze test_multi;
VACUUM
evantest=# select count(*) from test_multi;
count
---------
1000000
(1 row)
evantest=# select * from test_multi limit 5;
category | name
----------+------------------------------------------------------------------
607050 | c555126a5afea9f5ffe3880248c89944d211bc378f8c3b6d125b4360fe8619b7
843579 | 69b5a1dba76f52ff238566a3f88315a7425116d5d271fb54701b6e49d4afd8ce
106298 | a96e8674db219e12463ecdbb405b99c631767972e489093045c97976c17c6561
621860 | 5e6739ea9f533f9cdb0b8db76e3d4ce63be6b2b612c8aff06c4b80451f8f2edc
290110 | 56944320e5abd3a854fffdd185b969727e8d414448d440725a392cda4c6355c4
(5 rows)
evantest=# \timing on
Timing is on.
evantest=# \o /dev/null
evantest=# set wip_radix_sort = 'off';
Time: 0.904 ms
evantest=# select * from test_multi limit 5;
Time: 0.983 ms
evantest=# select * from test_multi order by category, name;
Time: 593.578 ms
evantest=# select * from test_multi order by category, name;
Time: 597.329 ms
evantest=# select * from test_multi order by category, name;
Time: 600.050 ms
evantest=# set wip_radix_sort = 'on';
Time: 0.737 ms
evantest=# select * from test_multi order by category, name;
Time: 611.604 ms
evantest=# select * from test_multi order by category, name;
Time: 613.115 ms
evantest=# select * from test_multi order by category, name;
Time: 615.003 ms
```
This seems like a real regression.
Then I tried to only sort on the first column, yes, now radix is faster:
```
evantest=# set wip_radix_sort = 'off’;
evantest=# select * from test_multi order by category;
Time: 445.498 ms
evantest=# select * from test_multi order by category;
Time: 451.834 ms
evantest=# select * from test_multi order by category;
Time: 454.531 ms
evantest=# set wip_radix_sort = 'on';
Time: 0.329 ms
evantest=# select * from test_multi order by category;
Time: 402.829 ms
evantest=# select * from test_multi order by category;
Time: 408.014 ms
evantest=# select * from test_multi order by category;
Time: 415.340 ms
evantest=# select * from test_multi order by category;
Time: 413.969 ms
```
Hope the test helps. (The test was run a MacBook M4. )
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John Naylor | 2025-10-29 08:54:11 | Re: [PATCH] Refactor bytea_sortsupport(), take two |
| Previous Message | Peter Eisentraut | 2025-10-29 07:51:00 | MSVC: Improve warning options set |