| From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
|---|---|
| To: | "Linus Heckemann" <linus(at)schreibt(dot)jetzt> |
| Cc: | "pgsql-general mailing list" <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Can we get sha* function over text, that could be used in index? |
| Date: | 2026-02-20 21:40:37 |
| Message-ID: | 63a9ec31-8a5f-47c7-8a70-f1c57207ffd3@manitou-mail.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Linus Heckemann wrote:
> - there is a byte-array representation of text columns, which appears to
> be independent of database encoding
Not sure what you're refering to. Both the on-disk and in-memory
representations of text/varchar are encoding-dependent.
> The obvious (to a naive user, like I was) approach, casting to bytea,
> has exceptionally surprising behaviour: for many text strings, it does
> exactly what the naive user might hope for, giving back the UTF-8
> representation. But multiple distinct text strings, like '\033' and
> '\x1b', convert to the same byte string! And text strings containing a
> backslash that doesn't fit the bytea hex format or the bytea escape
> format will fail to convert completely!
Yes. It seems a common mistake to forget or ignore that
backslashes are special in the input text representation of bytea.
It might be not obvious from reading the doc at [1]
but we just need to quote backslashes by doubling them.
AFAIK a working solution for the OP would be:
sha256(replace(colname, '\', '\\')::bytea)
The result is encoding-dependent, but that does not matter
in the context of an expression. index.
If the database ever needs to change its encoding, it will have to
be recreated entirely anyway.
[1]
https://www.postgresql.org/docs/current/datatype-binary.html#DATATYPE-BINARY-BYTEA-ESCAPE-FORMAT
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Linus Heckemann | 2026-02-21 09:54:53 | Re: Can we get sha* function over text, that could be used in index? |
| Previous Message | Adrian Klaver | 2026-02-20 18:40:05 | Re: Clarification on PGDG Package Policies Across Platforms |