| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Aleksander Alekseev <aleksander(at)tigerdata(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Subject: | Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions |
| Date: | 2026-03-26 01:21:38 |
| Message-ID: | CAD21AoC=92zAEmaGt+3U1jSD77jgdQT_HXUwPPtZEze2sJ=vWA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Tomas Vondra <tomas(at)vondra(dot)me> writes:
> > > On 3/26/26 00:40, Tom Lane wrote:
> > >> I believe what's happening there is that in cs_CZ locale,
> > >> "V" doesn't follow simple ASCII sort ordering.
> >
> > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > other way around. V is not special in any way.
> >
> > Ah, sorry, I should have researched a bit instead of relying on
> > fading memory. The quirk I was thinking of is that in cs_CZ,
> > "ch" sorts after "h":
> >
> > u8=# select 'h' < 'ch'::text collate "en_US";
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > Regular hex encoding isn't bitten by that because it doesn't
> > use 'h' in the text form ... but this base32hex thingie does.
> >
> > However, your point is also correct:
> >
> > u8=# select '0' < 'C'::text ;
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > u8=# select '0' < 'C'::text collate "cs_CZ";
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > and that breaks "text ordering matches numeric ordering"
> > for both traditional hex and base32hex. So maybe this
> > is not as big a deal as I first thought. We need a fix
> > for the new test though. Probably adding COLLATE "C"
> > would be enough.
>
> Thank you for the report and the analysis.
>
> I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> "C" to the query resolves it. It seems also a good idea to add a note
> in the documentation too as users might face the same issue. For
> example,
>
> To maintain the lexicographical sort order of the encoded data, ensure
> that the text is sorted using the C collation (e.g., using COLLATE
> "C"). Natural language collations may sort characters differently and
> break the ordering.
>
Attached the patch doing the above idea.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-UUID-sortability-tests-in-base32hex-encoding.patch | text/x-patch | 3.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-03-26 01:31:19 | Re: Track skipped tables during autovacuum and autoanalyze |
| Previous Message | Yugo Nagata | 2026-03-26 01:18:39 | Re: Track skipped tables during autovacuum and autoanalyze |