branch-free tuplesort partitioning

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: branch-free tuplesort partitioning
Date: 2024-11-25 12:13:39
Message-ID: CANWCAZbAmaZ7P+ARjS97sJLXsBB5CPZyzFgqNDiqe-L+BqXzug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a very rough and limited proof of concept of $subject when
tuplesort uses abbreviated keys. It only works with int64:

Demo:

--setup
drop table if exists test;
create table test (a bigint);
insert into test select (1_000_000_000 * random())::bigint from
generate_series(1,1_000_000,1) i;
vacuum freeze test;
create extension if not exists pg_prewarm ;
select pg_prewarm('test');

set work_mem = '64MB';
\timing

select * from test order by a offset 1_000_000_000;
Time: 238.925 ms

set debug_branchless_sort = 'on'; select * from test order by a offset
1_000_000_000;
Time: 202.977 ms

With the patch, this case is probably held back by our simple
insertion sort, which I haven't changed.

The algorithm with pythonic pseudocode is from [1]. There is also Rust
code under a permissive license by the same authors elsewhere.

To evaluate this technique further, it'll need some work to handle
duplicates well. Then we can run tests on various input distributions.
If that's promising, we'll need to proceed with an idea I had years
ago to separate null/not-null into separate arrays as a prerequisite.
Then, we can probably simplify tuplesort.c some, including removing
each separate template invocation for integer-like keys. I'm also not
using the sort template in the PoC, to simplify development, and it's
not clear this work can be folded back into there without adding a lot
of complexity.

[1] https://orlp.net/blog/branchless-lomuto-partitioning/

--
John Naylor
Amazon Web Services

Attachment Content-Type Size
v1-0001-Branchless-Lomuto-partitioning.patch text/x-patch 7.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-11-25 12:16:14 Re: Add support for Tcl 9
Previous Message Yash Jain 2024-11-25 11:49:24 What db objects can only be created with superuser?