Quick Links

Re: [PATCH] Refactor bytea_sortsupport(), take two

From:	John Naylor <johncnaylorls(at)gmail(dot)com>
To:	Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject:	Re: [PATCH] Refactor bytea_sortsupport(), take two
Date:	2025-11-13 09:38:04
Message-ID:	CANWCAZbMyrijdR0xc-4SqpNJBHMEwRZccBK4fa0aquNpq2Uj7w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Nov 10, 2025 at 6:22 PM Aleksander Alekseev
<aleksander(at)tigerdata(dot)com> wrote:

> OK, I made sure all the comments are in place. I referenced
> corresponding varlena.c functions where the comments were too long to
> my taste to repeat them entirely.

Looks mostly okay. The "See varstr_abbrev_abort() for more details."
are repeated several times close together, so I would replace those
with a header comment like "This is based on varstr_abbrev_abort(),
but some comments have been elided for brevity. See there for more
details."

+ * the worst case, where we do many string transformations for no saving
+ * in full strcoll()-based comparisons. These statistics are used by
+ * bytea_abbrev_abort().

s/strcoll/memcmp/, and maybe s/transformations/abbreviations/ since
that was talking about strxfrm().

> > - * More generally, it's okay that bytea callers can have NUL bytes in
> > - * strings because abbreviated cmp need not make a distinction between
> > - * terminating NUL bytes, and NUL bytes representing actual NULs in the
> > - * authoritative representation. Hopefully a comparison at or past one

> > Why is this gone -- AFAICT it's still true for bytea, no matter what
> > file it's located in?
>
> Actually for bytea_abbrev_convert() this comment makes no sense IMO.
> Presumably the reader is aware that '\x0000...' is a valid bytea, and
> there is no such thing as "terminating NUL bytes" for bytea.

It makes perfect sense to me. The abbreviated datum has terminating
NUL bytes if the source length is shorter than 8 bytes. The comment is
explaining why comparing bytea abbreviated datums still works. It
seems like everything starting with "abbreviated cmp need not ..." is
still appropriate for bytea.c.

Likewise, this part of varlena.c is still factually accurate as an aside:

- * (Actually, even if there were NUL bytes in the blob it would be
- * okay. See remarks on bytea case above.)

...although with the above, we'll have to point to the relevant place
in bytea.c.

I only have one real nitpick left (diff'ing between files):

/* Hash abbreviated key */
{
- uint32 tmp;
+ uint32 lohalf,
+ hihalf;

- tmp = DatumGetUInt32(res) ^ (uint32) (DatumGetUInt64(res) >> 32);
- hash = DatumGetUInt32(hash_uint32(tmp));
+ lohalf = (uint32) res;
+ hihalf = (uint32) (res >> 32);
+ hash = DatumGetUInt32(hash_uint32(lohalf ^ hihalf));
}

If we're going to do anything different here at all, the logical thing
would be to simplify with murmurhash64() since datums are now 8 bytes
everywhere. That's material for a separate patch, though.

--
John Naylor
Amazon Web Services

In response to

Re: [PATCH] Refactor bytea_sortsupport(), take two at 2025-11-10 11:22:29 from Aleksander Alekseev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yugo Nagata	2025-11-13 09:40:36	Re: Suggestion to add --continue-client-on-abort option to pgbench
Previous Message	Quan Zongliang	2025-11-13 09:36:19	Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement