Re: tuple radix sort

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: cca5507 <cca5507(at)qq(dot)com>
Cc: zengman <zengman(at)halodbtech(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: tuple radix sort
Date: 2026-02-13 02:32:29
Message-ID: CANWCAZZA_AqWy7xw20O=jWG+t0Cy4rUnvs1RaAABvSp3iLnGhg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 11, 2026 at 5:25 PM zengman <zengman(at)halodbtech(dot)com> wrote:
> I'm wondering if we should replace `state->memtuples` with `data` in the `sort_byvalue_datum()` function here.
> ```
> if (nulls_first)
> {
> null_start = state->memtuples;

Sounds good.

On Wed, Feb 11, 2026 at 6:47 PM cca5507 <cca5507(at)qq(dot)com> wrote:
> +1. Now data == state->memtuples, how about remove the parameter "data" and "n" and just
> get them from "Tuplesortstate"?

The parameter list currently looks like a sort function. I'm not sure
why you think that's objectionable?

> Some comments for v7:
>
> 1)
>
> ```
> /*
> * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
> */
> static inline uint8
> current_byte(Datum key, int level)
> {
> int shift = (SIZEOF_DATUM - 1 - level) * BITS_PER_BYTE;
>
> return (key >> shift) & 0xFF;
> }
> ```
>
> Maybe "0 for MSB, 7 for LSB"? If level == 0, this function will return the Most Significant Byte.

Will fix, thanks.

> 2) radix_sort_tuple()
>
> ```
> size_t end_offset = partitions[*rp].next_offset;
> SortTuple *partition_end = begin + end_offset;
> ptrdiff_t num_elements = end_offset - start_offset;
> ```
>
> Why the type of "num_elements" is "ptrdiff_t"? Maybe just "size_t"?

That's from upstream. It's pretty rare in our codebase for in-core
code, so I'll go ahead and change it.

> 3) tuplesort_sort_memtuples()
>
> ```
> /*
> * Do we have the leading column's value or abbreviation in datum1?
> */
> if (state->base.haveDatum1 && state->base.sortKeys)
> {
> SortSupportData ssup = state->base.sortKeys[0];
> ```
>
> I think we should avoid the copy of SortSupportData. We can just use a pointer.

Will do, for consistency.

> 4)
>
> Many places just consider "Datum" as integer, do we need to add a DatumGetUInt**() for them?

It already does that for the case where it specifically needs uint32.
For the general case, I don't think so. We can treat a Datum as an
unsigned integer and don't care about the actual size or what it
contains. There is one exception that I can see: 0002 has a call to
pg_leftmost_one_pos64(), so a macro for that would be more appropriate
than an implicit cast.

Now I'm wondering if 0002 should normalize the datum as it determines
the common prefix. That should only matter for int32 where the input
has a mix of positive and negative, but it could be worth the
additional calculation. I'll hold off on committing 0002 for a bit
longer, but 0001 (plus the changes that I've agreed to) is still on
schedule.

Also, looking at commit 6aebedc38, it's preferred to use
"sizeof(Datum)" rather than the vestigial SIZEOF_DATUM, so I'll change
that in 0001/2.

--
John Naylor
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2026-02-13 02:35:56 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Michael Paquier 2026-02-13 02:31:12 Re: recovery.signal not cleaned up when both signal files are present