Re: Optimize UUID parse using SIMD

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimize UUID parse using SIMD
Date: 2026-06-25 22:16:00
Message-ID: CAD21AoAGKT8kj2tdYNq0xBnzicQkz5s=ctG2yx5xAKx-69C5YA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 25, 2026 at 2:31 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
>
>
>
> On Thu, Jun 25, 2026 at 11:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>> Hi all,
>>
>> I'd like to propose the $subject.
>>
>> Since commit ec8719ccbfcd made hex_decode_safe() SIMD-aware, decoding
>> a run of hex digits is now fast. The attached patch reuses
>> hex_decode_safe() in the UUID input function to speed up parsing.
>>
>> We accept several textual forms of a UUID[1]. The fast path handles
>> the common ones: 32 hex digits, the canonical 8x-4x-4x-4x-12x form
>> (where "nx" means n hex digits), and either of those wrapped in
>> braces. Otherwise, it falls back to the ordinary scalar UUID parse.
>>
>> I've benchmarked the parse speed using the following query:
>>
>> CREATE TEMP TABLE u AS SELECT gen_random_uuid()::text AS t FROM
>> generate_series(1, 1000000);
>> EXPLAIN (ANALYZE, TIMING OFF) SELECT t::uuid FROM u;
>>
>> I compared the execution time of the second query, which measures
>> uuid_in() alone, with/without SIMD optimization. Here are results (the
>> median of 5 runs):
>>
>> HEAD: 208.879 ms
>> Patched: 40.983 ms
>>
>> The improvements look promising to me. But in a realistic pipeline the
>> parse is a small fraction of the work, so end-to-end gains could be
>> much smaller.
>>
>> Feedback is very welcome.
>>
> I may be missing something, but I wonder whether the fast path is relying on
> slightly different input semantics from the existing UUID parser.
>
> In particular, hex_decode_safe() is not a strict “32 hex characters only”
> decoder. It skips whitespace, which is fine for its existing callers, but I
> don’t think UUID input should treat whitespace inside the UUID body as
> ignorable.

Good catch! hex_decode_safe() skips whitespaces so the patch accepts
the following UUID value, which is bad:

select '019f00b5-7f8a-722f-b707-59f0ed25cd '::uuid;
uuid
--------------------------------------
019f00b5-7f8a-722f-b707-59f0ed25cd00
(1 row)

> Also, since hex_decode_safe() returns void, the UUID fast path
> cannot verify that exactly UUID_LEN bytes were produced.

IIUC hex_decode_safe() does return the output length in bytes. So I
think we can fallback to the scalar UUID parser if
esctx.error_occurred is true or if the returned value is not 16.

>
> So I think it would be safer either to pre-validate that the 32 source
> characters are all hex digits before calling hex_decode_safe(), or to use a
> UUID-specific strict hex decoder for this path. After that, a comment
> explaining why hex_decode_safe() is safe here would make the invariant much
> clearer.

IIUC hex_decode_simd_helper() accepts only hex digits so we could
re-use it for UUID parsing. Let me check if the above idea of using
the return value works for us first.

>
> Could you also add a few regression tests for invalid inputs that contain
> whitespace inside otherwise fast-path-looking UUID strings? For example:
>
> ---------------------------------------------------------------
>
> SELECT 'a0eebc99 9c0b4ef8bb6d6bb9bd380a11'::uuid;
> SELECT 'a0eebc999c0b4ef8bb6d6bb9bd380a1 '::uuid;
> SELECT '{a0eebc999c0b4ef8bb6d6bb9bd380a1 }'::uuid;
> SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1 '::uuid;
> ---------------------------------------------------------------
>
> These should continue to be rejected in the same way as the scalar parser.
> Regards,

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-06-25 22:17:08 Re: bytea(uuid) missing proleakproof?
Previous Message Masahiko Sawada 2026-06-25 21:35:44 Re: bytea(uuid) missing proleakproof?