| From: | Aleksander Alekseev <aleksander(at)tigerdata(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | John Naylor <johncnaylorls(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Subject: | Re: [PATCH] Refactor bytea_sortsupport(), take two |
| Date: | 2025-11-10 11:22:29 |
| Message-ID: | CAJ7c6TOP=kjnT+Aeoaige-qp7pPstZ_M8CcrMTfvwUaiYu952w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi John,
Many thanks for the feedback.
> For example in *_abbrev_convert:
>
> [...]
>
> The first part of the comment explained why we do this at all. The
> bytea type does not use strcoll, so it's possible that a future patch
> may decide to skip cardinality estimation entirely. It's obviously not
> the responsibility of this refactoring patch to investigate that, but
> throwing the reason out makes it harder for someone to discover that
> bytea could do something different here. In this case, maybe we could
> point to varstr_abbrev_convert instead of repeating the whole comment.
Fixed. I choose to repeat the entire comment since it is relatively
short in this case.
> /*
> - * Clamp cardinality estimates to at least one distinct value. While
> - * NULLs are generally disregarded, if only NULL values were
> seen so far,
> - * that might misrepresent costs if we failed to clamp.
> + * Clamp cardinality estimates to at least one distinct value.
> */
>
> The old comment explained why we clamp, but the new comment just says
> what the code is doing, and it's obvious what the code is doing.
Fixed in the same manner.
> - /*
> - * In the worst case all abbreviated keys are identical, while
> at the same
> - * time there are differences within full key strings not captured in
> - * abbreviations.
> - */
>
> Why is this gone? I could go on, but hopefully you get my point. If
> it's short, just keep it, and adjust as necessary. If it's long, point
> to varlena.c if the idea is the same.
OK, I made sure all the comments are in place. I referenced
corresponding varlena.c functions where the comments were too long to
my taste to repeat them entirely.
> Back to the actual patch:
>
> - * More generally, it's okay that bytea callers can have NUL bytes in
> - * strings because abbreviated cmp need not make a distinction between
> - * terminating NUL bytes, and NUL bytes representing actual NULs in the
> - * authoritative representation. Hopefully a comparison at or past one
> - * abbreviated key's terminating NUL byte will resolve the comparison
> - * without consulting the authoritative representation; specifically, some
> - * later non-NUL byte in the longer string can resolve the comparison
> - * against a subsequent terminating NUL in the shorter string. There will
> - * usually be what is effectively a "length-wise" resolution there and
> - * then.
> - *
> - * If that doesn't work out -- if all bytes in the longer string
> - * positioned at or past the offset of the smaller string's (first)
> - * terminating NUL are actually representative of NUL bytes in the
> - * authoritative binary string (perhaps with some *terminating* NUL bytes
> - * towards the end of the longer string iff it happens to still be small)
> - * -- then an authoritative tie-breaker will happen, and do the right
> - * thing: explicitly consider string length.
>
> Why is this gone -- AFAICT it's still true for bytea, no matter what
> file it's located in?
Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.
For varstr_abbrev_convert() as you correctly pointed out before [1] we
actually disallow NUL bytes [2]. The reason why the comment is
currently there is that varstr_abbrev_convert() may have bytea callers
which may have NUL bytes.
So it seems to me that rewriting this particular part is exactly in
scope of the refactoring, unless I'm missing something.
[1]: https://postgr.es/m/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A%40mail.gmail.com
[2]: https://www.postgresql.org/docs/current/datatype-character.html
--
Best regards,
Aleksander Alekseev
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Refactor-bytea_sortsupport.patch | text/x-patch | 12.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2025-11-10 11:27:03 | Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement |
| Previous Message | Tender Wang | 2025-11-10 11:11:16 | Fix a typo in the comment for gettuple_eval_partition() |