Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

From: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: 2025-10-28 14:28:54
Message-ID: 1543828736.1122782.1761661734535@mail.yahoo.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First of all, I'm definitely a proponent of being able to encode UUIDsusing base32hex in Postgres.

On Mon, 27 Oct 2025 at 23:37, Sergey Prokhorenko
<sergeyprokhorenko(at)yahoo(dot)com(dot)au> wrote:
> I wanted to highlight an important discussion among the authors and contributors of RFC 9562 regarding UUID text encoding:
>
> https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817

I think a very important thing to note here is that this is a github
discussion, not an officially accepted RFC. I think if it was an
officially accepted RFC on how to encode UUIDs then you would have a
lot less pushback here. Right now your emails mostly read like you
want to push your preferential format, while essentially disallowing
other encodings. While base32hex seems like a good choice for UUIDv7 I
see no reason to give it preferential treatment at this point in time.
crockford base32 seems just as valid. And e.g. base64url[1] seems
totally fine for UUID versions that have no inherent ordering like
UUIDv4. And if someone comes up with a base64urlhex format you could
have even shorter bit still sortable UUIDs at the expense of
legibility.

The main reason why a specific encoding should receive preferential
treatment in Postgres, would be if it was standardized, as that would
help with interoperability. At this point in time there's no such
standard (not even a draft), so forcing an explicit encoding will
actually reduce interoperability, because people already encode their
UUIDs in various different forms.

> but the discussion established that base32hex is the existing standard format already defined in RFC 4648, Section 7, specifically designed for sort-preserving encoding.

You even reach a similar conclusion here: not choosing crockford
base32, purely because it does not have an official RFC.

> This context is crucial because it underscores that the uuid type, as a first-class concept, deserves its own standardized text encoding.

It already has! The standard text encoding is defined in RFC 4122.
That's why postgres displays it as such when encoding to text.

> Regarding the proposal to couple UUID encoding with the bytea type through encode()/decode() functions: I understand the appeal of reusing existing infrastructure, but this creates a conceptual mismatch. UUID is a distinct semantic type in PostgreSQL, not merely binary data. The bytea type has existed for decades without base32hex encoding, and that's worked fine, because bytea represents arbitrary binary data, not universally unique identifiers with specific structural properties and needs.

I think by far the first step is to make the encoding of UUIDs in
different formats possible in Postgres. The way to do so with the
least API impact (and thus as you noticed, least pushback), would be
to add base32hex to the list of encoding formats in the encode/decode
functions. Then combining that with UUID <-> bytea casting (which also
seems totally reasonable functionality to me), would give you the
functionality (but not the defaults you want).

In a follow up patch I would personally be fine making the API to
encode UUIDs a bit more friendly. In particular, adding an overload to
the encode function that takes a UUID instead of a bytea seems
reasonable to me, i.e. encode(id uuid, format text) -> text

I'm currently less convinced about a decode_uuid function though. I
think some perf argument (including some benchmarks) would need to be
made to convince me of its usefulness. Because purely from an API
friendliness lens, I feel like decode('...', 'base32hex)::uuid and
decode_uuid('...', 'base32hex') rank basically the same.

Once/if an accepted RFC actually defines a default shorter encoding
for UUIDs we could I would definitely be in favor of adding a
decode_uuid function with the default encoding configured as a default
argument. As well as adding the default argument to the uuid encode
overload function.
______________________________________________________________________________________________

Hi Jelte,
I agree with your points.
I believe we should put the discussion about compact UUID text encoding in PostgreSQL on hold for now. None of the proposed solutions has sufficient unconditional support from the participants. It makes sense to pause this discussion for more in-depth exploration to try and reach a consensus.
Jelte, I particularly liked your idea of a new, dedicated, standardized encoding for UUIDs, base64urlhex, and dedicated encoding/decoding functions for this encoding in PostgreSQL. I will try to develop such an encoding and submit it for discussion. I suggest calling it base64uuid.
My current attempt to establish base32hex as a de facto standard (even prior to an RFC) was unsuccessful. However, I remain convinced, like the authors of RFC 9562, that there should be only one standard compact encoding for UUIDs. Therefore, we must continue efforts to standardize such an encoding.
As for Crockford's Base32, it was rejected because of a lack of support in standard programming language libraries. Otherwise, it's just as good as base32hex.

Best regards,Sergey Prokhorenko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-10-28 14:32:19 Re: Batching in executor
Previous Message Heikki Linnakangas 2025-10-28 14:17:11 Re: POC: make mxidoff 64 bits