Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Aleksander Alekseev <aleksander(at)tigerdata(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: 2026-03-26 01:21:38
Message-ID: CAD21AoC=92zAEmaGt+3U1jSD77jgdQT_HXUwPPtZEze2sJ=vWA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Tomas Vondra <tomas(at)vondra(dot)me> writes:
> > > On 3/26/26 00:40, Tom Lane wrote:
> > >> I believe what's happening there is that in cs_CZ locale,
> > >> "V" doesn't follow simple ASCII sort ordering.
> >
> > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > other way around. V is not special in any way.
> >
> > Ah, sorry, I should have researched a bit instead of relying on
> > fading memory. The quirk I was thinking of is that in cs_CZ,
> > "ch" sorts after "h":
> >
> > u8=# select 'h' < 'ch'::text collate "en_US";
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > Regular hex encoding isn't bitten by that because it doesn't
> > use 'h' in the text form ... but this base32hex thingie does.
> >
> > However, your point is also correct:
> >
> > u8=# select '0' < 'C'::text ;
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > u8=# select '0' < 'C'::text collate "cs_CZ";
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > and that breaks "text ordering matches numeric ordering"
> > for both traditional hex and base32hex. So maybe this
> > is not as big a deal as I first thought. We need a fix
> > for the new test though. Probably adding COLLATE "C"
> > would be enough.
>
> Thank you for the report and the analysis.
>
> I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> "C" to the query resolves it. It seems also a good idea to add a note
> in the documentation too as users might face the same issue. For
> example,
>
> To maintain the lexicographical sort order of the encoded data, ensure
> that the text is sorted using the C collation (e.g., using COLLATE
> "C"). Natural language collations may sort characters differently and
> break the ordering.
>

Attached the patch doing the above idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
0001-Fix-UUID-sortability-tests-in-base32hex-encoding.patch text/x-patch 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-03-26 01:31:19 Re: Track skipped tables during autovacuum and autoanalyze
Previous Message Yugo Nagata 2026-03-26 01:18:39 Re: Track skipped tables during autovacuum and autoanalyze