Re: [PATCH] Refactor bytea_sortsupport(), take two

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Refactor bytea_sortsupport(), take two
Date: 2025-11-26 07:18:52
Message-ID: CANWCAZY0ndYj881r_4OW-COj=HvtawXc8hJ5OquL2ma1vYn2mg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 17, 2025 at 5:57 PM Aleksander Alekseev
<aleksander(at)tigerdata(dot)com> wrote:
> > > > - * More generally, it's okay that bytea callers can have NUL bytes in
> > > > - * strings because abbreviated cmp need not make a distinction between
> > > > - * terminating NUL bytes, and NUL bytes representing actual NULs in the
> > > > - * authoritative representation. Hopefully a comparison at or past one
> >
> > > > Why is this gone -- AFAICT it's still true for bytea, no matter what
> > > > file it's located in?
> > >
> > > Actually for bytea_abbrev_convert() this comment makes no sense IMO.
> > > Presumably the reader is aware that '\x0000...' is a valid bytea, and
> > > there is no such thing as "terminating NUL bytes" for bytea.
> >
> > It makes perfect sense to me. The abbreviated datum has terminating
> > NUL bytes if the source length is shorter than 8 bytes. The comment is
> > explaining why comparing bytea abbreviated datums still works. It
> > seems like everything starting with "abbreviated cmp need not ..." is
> > still appropriate for bytea.c.
>
> OK, I returned the comment to bytea.c, but replaced "string" with
> "bytea". IMO mentioning "strings" is confusing in this context.
>
> I still don't like the fact that the comment is reasoning about
> "terminating NUL bytes" in the context of bytea, but I kept it as is
> for now. It seems confusing to me. Do you think we should rewrite
> this?

+ /*
+ * It's okay that callers can have NUL bytes in bytea because abbreviated
+ * cmp need not make a distinction between terminating NUL bytes, and NUL
+ * bytes representing actual NULs in the authoritative representation.

I mentioned everything starting with "abbreviated cmp need not ..."
was okay, so I agree that "It's okay that callers can have NUL bytes
in bytea" is redundant. Was that the confusing part for you? How about
this:

"Short byteas will have terminating NUL bytes in the abbreviated
datum. Abbreviated comparison need not make a distinction between
thse NUL bytes, and NUL bytes representing actual NULs in the
authoritative representation." [...]

After that, the rest seems to flow better at a quick glance.

> > Likewise, this part of varlena.c is still factually accurate as an aside:
> >
> > - * (Actually, even if there were NUL bytes in the blob it would be
> > - * okay. See remarks on bytea case above.)
> >
> > ...although with the above, we'll have to point to the relevant place
> > in bytea.c.
>
> Not sure if I follow. Previously we agreed that we disallow NUL bytes
> within strings, except for terminating ones. What we are doing here is
> refactoring / simplifying the code. Why keep irrelevant and confusing
> comments?

Okay, I'll concede that's it's no longer useful and maybe a distraction..

--
John Naylor
Amazon Web Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-11-26 07:23:39 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message shveta malik 2025-11-26 07:13:26 Re: How can end users know the cause of LR slot sync delays?