Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: 2025-10-24 23:31:25
Message-ID: CAD21AoCzEDdwpyPwA0d-QmCRe5rMz3m160SJgxMwKke85e8n0w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 24, 2025 at 3:42 PM Sergey Prokhorenko
<sergeyprokhorenko(at)yahoo(dot)com(dot)au> wrote:
>
>
>
> On Friday 24 October 2025 at 09:24:15 pm GMT+3, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>
> On Fri, Oct 24, 2025 at 12:17 AM Sergey Prokhorenko
> <sergeyprokhorenko(at)yahoo(dot)com(dot)au> wrote:
> >
> >
> > Masahiko,
> >
> > Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.
> >
> > But here is a solution that I hope will satisfy everyone:
> >
> > encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
>
> > Does it mean the first argument is uuid type data and when
> > 'uuid_to_base32hex' is specified as the format the function requires a
> > uuid data at the first argument?
>
> Yes, that's right.
> PostgreSQL will automatically cast the string '019535d9-3df7-79fb-b466-fa907fa17f9e' to the uuid type, since the format is correct.
>
>
> > I could not understand the difference
> > between specifying 'based32hex' and 'uuid_to_base32hex' when encoding
> > UUID data with base32hex encoding.
>
>
> 1. Specifying 'based32hex' in encode function means the first parameter is of bytea type as usual.
>
>
> But specifying 'uuid_to_base32hex' means the first parameter is of uuid type.
>
>
> 2. The encode function does not yet support format based32hex. Therefore, it is not known whether padding ===== should be used.
> But padding ===== is not used when specifying 'uuid_to_base32hex'.
>
>
> > decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
>
> > Suppose that the decode() takes text data at the first argument and
> > returns UUID data, the function signature would be decode(text, text)
> > -> uuid. But we cannot create two functions that have the same name
> > and the same argument types.
>
> Yes, you're right. This is a problem that can't be solved without composite return values. We clearly took the wrong approach by coupling UUID conversion with encode/decode functions, which only apply to bytea. UUID and bytea are fundamentally different data types. Meanwhile, PostgreSQL has over 30 other type conversion functions that deal with other data types. For example, array_to_string, string_to_array, jsonb_to_record, to_char, to_timestamp, and to_hex. In this situation, the best solution would be to revert to the original uuid_to_base32hex() and base32hex_to_uuid() functions rather than deal with type incompatibility issues.
>

I think that type conversions and data encodings serve different
purposes. Type conversions express semantic transformations between
data types (e.g., text -> timestamp, jsonb -> record), while encodings
are simply representations of binary data as text. For the latter,
PostgreSQL already provides a well-defined abstraction through
encode()/decode(). Mixing encoding logic with type-specific
conversions would blur that boundary.

Also, if we start adding dedicated functions for each supported
encoding (uuid_to_base32hex, uuid_to_hex etc.), the number of
functions could easily multiply. That’s exactly what encode() and
decode() were designed to avoid.

While I agree that base32hex should be the recommended, I'm really not
sure it's a good design that PostgreSQL core should enforce it as the
only built-in method. It seems better to me to provide flexible
primitives, encode()/decode() plus UUID <-> bytea casts, and document
base32hex as the canonical convention (if necessary). Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure. I'd like to
hear opinions from other hackers too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-10-25 00:04:59 Re: Bug in pg_stat_statements
Previous Message Sergey Prokhorenko 2025-10-24 22:39:45 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions