Re: [PATCH] Refactor bytea_sortsupport(), take two

From: Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject: Re: [PATCH] Refactor bytea_sortsupport(), take two
Date: 2025-11-10 11:22:29
Message-ID: CAJ7c6TOP=kjnT+Aeoaige-qp7pPstZ_M8CcrMTfvwUaiYu952w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi John,

Many thanks for the feedback.

> For example in *_abbrev_convert:
>
> [...]
>
> The first part of the comment explained why we do this at all. The
> bytea type does not use strcoll, so it's possible that a future patch
> may decide to skip cardinality estimation entirely. It's obviously not
> the responsibility of this refactoring patch to investigate that, but
> throwing the reason out makes it harder for someone to discover that
> bytea could do something different here. In this case, maybe we could
> point to varstr_abbrev_convert instead of repeating the whole comment.

Fixed. I choose to repeat the entire comment since it is relatively
short in this case.

> /*
> - * Clamp cardinality estimates to at least one distinct value. While
> - * NULLs are generally disregarded, if only NULL values were
> seen so far,
> - * that might misrepresent costs if we failed to clamp.
> + * Clamp cardinality estimates to at least one distinct value.
> */
>
> The old comment explained why we clamp, but the new comment just says
> what the code is doing, and it's obvious what the code is doing.

Fixed in the same manner.

> - /*
> - * In the worst case all abbreviated keys are identical, while
> at the same
> - * time there are differences within full key strings not captured in
> - * abbreviations.
> - */
>
> Why is this gone? I could go on, but hopefully you get my point. If
> it's short, just keep it, and adjust as necessary. If it's long, point
> to varlena.c if the idea is the same.

OK, I made sure all the comments are in place. I referenced
corresponding varlena.c functions where the comments were too long to
my taste to repeat them entirely.

> Back to the actual patch:
>
> - * More generally, it's okay that bytea callers can have NUL bytes in
> - * strings because abbreviated cmp need not make a distinction between
> - * terminating NUL bytes, and NUL bytes representing actual NULs in the
> - * authoritative representation. Hopefully a comparison at or past one
> - * abbreviated key's terminating NUL byte will resolve the comparison
> - * without consulting the authoritative representation; specifically, some
> - * later non-NUL byte in the longer string can resolve the comparison
> - * against a subsequent terminating NUL in the shorter string. There will
> - * usually be what is effectively a "length-wise" resolution there and
> - * then.
> - *
> - * If that doesn't work out -- if all bytes in the longer string
> - * positioned at or past the offset of the smaller string's (first)
> - * terminating NUL are actually representative of NUL bytes in the
> - * authoritative binary string (perhaps with some *terminating* NUL bytes
> - * towards the end of the longer string iff it happens to still be small)
> - * -- then an authoritative tie-breaker will happen, and do the right
> - * thing: explicitly consider string length.
>
> Why is this gone -- AFAICT it's still true for bytea, no matter what
> file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

For varstr_abbrev_convert() as you correctly pointed out before [1] we
actually disallow NUL bytes [2]. The reason why the comment is
currently there is that varstr_abbrev_convert() may have bytea callers
which may have NUL bytes.

So it seems to me that rewriting this particular part is exactly in
scope of the refactoring, unless I'm missing something.

[1]: https://postgr.es/m/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A%40mail.gmail.com
[2]: https://www.postgresql.org/docs/current/datatype-character.html

--
Best regards,
Aleksander Alekseev

Attachment Content-Type Size
v5-0001-Refactor-bytea_sortsupport.patch text/x-patch 12.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-11-10 11:27:03 Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement
Previous Message Tender Wang 2025-11-10 11:11:16 Fix a typo in the comment for gettuple_eval_partition()