Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

From: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: 2025-10-27 22:36:51
Message-ID: 1154454839.957923.1761604611424@mail.yahoo.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers




On Sat, Oct 25, 2025 at 11:07 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
> > On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Or providing
> > 'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
> > text) -> uuid' might make sense too, but I'm not sure.
>
> I like the idea, so I drafted a prototype for discussion.
> Though I do not see what else methods should be provided along with added one...

Thank you for drafting the patch! But I find it potentially confusing
to have different encoding methods for bytea and UUID types. I don't
see a compelling reason why the core should support base32hex
exclusively for the UUID data type, nor why base32hex should be the
only encoding method that the core provides for UUIDs (while we can
use it by default).

If we implement uuid_encode() and uuid_decode(), we might end up
creating similar encoding and decoding functions for other data types
as well, which doesn't seem like the best approach. I still believe
that extending the existing encode() and decode() functions is a
better starting point.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
________________________________________________

Masahiko,
I wanted to highlight an important discussion among the authors and contributors of RFC 9562 regarding UUID text encoding:
https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817
The RFC 9562 authors and contributors reached consensus that standardizing an alternate short text format for UUIDs is important. While the community debated between base32hex (RFC 4648) and Crockford's Base32, both were recognized for preserving lexicographical sort order, a critical property for database primary keys and URL-safe identifiers. Time constraints prevented inclusion in RFC 9562, but the discussion established that base32hex is the existing standard format already defined in RFC 4648, Section 7, specifically designed for sort-preserving encoding.

This context is crucial because it underscores that the uuid type, as a first-class concept, deserves its own standardized text encoding.

Regarding the proposal to couple UUID encoding with the bytea type through encode()/decode() functions: I understand the appeal of reusing existing infrastructure, but this creates a conceptual mismatch. UUID is a distinct semantic type in PostgreSQL, not merely binary data. The bytea type has existed for decades without base32hex encoding, and that's worked fine, because bytea represents arbitrary binary data, not universally unique identifiers with specific structural properties and needs.
Consider PostgreSQL's own design philosophy. The documentation states:
"9.5. Binary String Functions and Operators  This section describes functions and operators for examining and manipulating binary strings, that is values of type bytea. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section."
PostgreSQL maintains parallel function sets for text strings and bytea precisely because they serve different purposes, despite the implementation overhead. The uuid type deserves the same treatment: it's not just another binary blob, but a type with specific semantics (uniqueness, version bits, variant encoding) and use cases (distributed identifiers, sortable keys, URL-safe representations).
Why should uuid be treated as a second-class citizen and forced through bytea conversion, when text and bytea each have their own dedicated function families?
You've been very careful in your previous arguments to separate data type conversion from encoding/decoding operations. I appreciate that rigor. However, the current proposal to route UUID encoding through bytea contradicts that principle. It merges two fundamentally different data types for convenience rather than correctness.
If someone wants to add base32hex encoding/decoding to bytea for general binary data operations, that's a worthwhile but separate discussion. The uuid type, however, needs native base32hex support to fulfill its role as a first-class PostgreSQL type with a standardized compact text representation, as recommended by the RFC 9562 community.

I would value your thoughts on these arguments.

Best regards,
Sergey Prokhorenko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-10-27 22:47:08 Re: another autovacuum scheduling thread
Previous Message Sami Imseih 2025-10-27 22:35:12 Re: another autovacuum scheduling thread