Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: 2026-03-14 04:10:31
Message-ID: CAD21AoCcikUgU5coyvM7PLJg5M-75LzQhm-Uxmqn_7M=DHkj+w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 18, 2026 at 6:58 AM Aleksander Alekseev
<aleksander(at)tigerdata(dot)com> wrote:
>
> Hi,
>
> > I only rebased v3 and improved the commit messages, but I didn't
> > account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> > working on this or others can pick it up?
> >
> > The patch is not on the commitfest, so I'm about to add it.
>
> Here is patch v5 where I accounted for the previous feedback from
> Masahiko Sawada and also made some other changes, see below.
>
> > How about the error message like "invalid input length for type uuid"?
> > I think "uuid" should be lower case as it indicates PostgreSQL uuid
> > data type, and it's better to use %s format instead of directly
> > writing "uuid" (see string_to_uuid() for example).
>
> Makes sense. Fixed.
>
> > As for the errdetail message, should we add "bytea" also after "got %d"?
>
> You probably meant "got %d bytes", not "got %d bytea". I believe the
> current message is fine, but maybe native speakers will correct us.
>
> > We already have tests for casting bytes to integer data types in
> > strings.sql. I suggest moving the casting tests from bytea to uuid
> > into therel.
>
> I disagree on the grounds that there are zero tests related to UUID in
> strings.sql; uuid.sql is a more appropriate place for these tests IMO.
> However if someone seconds the idea we can easily move the tests at
> any time.
>
> > For the uuid.sql file, we could add a test to verify that
> > a UUID value remains unchanged when it's cast to bytea and back to
> > UUID. For example,
> >
> > SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
>
> Good point. Added.
>
> > base32hex_encode() doesn't seem to add '=' paddings, but is it
> > intentional? I don't see any description in RFC 4648 that we can omit
> > '=' paddings.
>
> You are right, both base32 and base32hex should add paddings;
> substring() can be used if necessary. Fixed.
>
> > I think the patch should add tests not only for uuid data type but
> > also for general cases like other encodings.
>
> Yes, and the good place for these tests would be closer to other tests
> for encode() and decode() i.e. strings.sql. Fixed.
>
> While working on it I noticed some inconsistencies between base32hex
> implementation and our current implementation of base64. As an
> example, we don't allow `=` input:
>
> ```
> =# SELECT decode('=', 'base64');
> ERROR: unexpected "=" while decoding base64 sequence
> ```
>
> ... while base32hex did. I fixed such inconsistencies too.
>
> > In uuid.sql tests, how about adding some tests to check if base32hex
> > maintains the sortability of UUIDv7 data?
>
> Agree. Added.
>
> > I think we should update the documentation in the uuid section about
> > casting data between bytea and uuid. For references, we have a similar
> > description for bytea and integer[1].
>
> Fair point. Fixed.
>

Thank you for updating the patch!

I've reviewed both patches and have some comments.

* 0001 patch

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("invalid input length for type %s", "uuid"),
+ errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));

I think we need to handle plural and singular forms depending on the
value. Or we can change it to "Expected size %d, got %d".

* 0002 patch:

errhint("Valid encodings are \"%s\", \"%s\", \"%s\",
and \"%s\".",
"base64", "base64url", "escape", "hex")));

We need to add 'base32hex' here.

---
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unexpected \"=\" while decoding
%s sequence",
+ "base32hex")));

I think we can directly write 'base32hex' in the error message.

---
+ /* Verify no extra bits remain (padding bits should be zero) */
+ if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
- 1)) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base32hex end sequence"),
+ errhint("Input data has non-zero padding bits.")));

This code checks if the remaining bits of the input data are all zero.
IIUC we don't have a similar check for base64 and base64url. For
instance, the following input data is accepted:

=# select decode('AB', 'base64');
decode
--------
\x00
(1 row)

I think it's better to have consistent behavior across our encoding.

I've attached a patch for the 0002 patch part that fixes the above
points (except for the last point) and has some minor fixes as well.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
fix_0002_masahiko.patch text/x-patch 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2026-03-14 07:00:00 Re: Instability of phycodorus in pg_upgrade tests with JIT
Previous Message Japin Li 2026-03-14 03:07:02 Re: Use SMgrRelation instead of SMgrRelationData * in pgaio_io_set_target_smgr()