Quick Links

Re: Optimize UUID parse using SIMD

From:	Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimize UUID parse using SIMD
Date:	2026-06-29 02:20:20
Message-ID:	CABXr29GkTK9RUqBuV9iK_mYKjjUu1sWg7j7ZB_wtQNv93Z0WYA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jun 25, 2026 at 3:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jun 25, 2026 at 2:31 PM Haibo Yan <tristan(dot)yim(at)gmail(dot)com> wrote:
> >
> >
> >
> > On Thu, Jun 25, 2026 at 11:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >>
> >> Hi all,
> >>
> >> I'd like to propose the $subject.
> >>
> >> Since commit ec8719ccbfcd made hex_decode_safe() SIMD-aware, decoding
> >> a run of hex digits is now fast. The attached patch reuses
> >> hex_decode_safe() in the UUID input function to speed up parsing.
> >>
> >> We accept several textual forms of a UUID[1]. The fast path handles
> >> the common ones: 32 hex digits, the canonical 8x-4x-4x-4x-12x form
> >> (where "nx" means n hex digits), and either of those wrapped in
> >> braces. Otherwise, it falls back to the ordinary scalar UUID parse.
> >>
> >> I've benchmarked the parse speed using the following query:
> >>
> >> CREATE TEMP TABLE u AS SELECT gen_random_uuid()::text AS t FROM
> >> generate_series(1, 1000000);
> >> EXPLAIN (ANALYZE, TIMING OFF) SELECT t::uuid FROM u;
> >>
> >> I compared the execution time of the second query, which measures
> >> uuid_in() alone, with/without SIMD optimization. Here are results (the
> >> median of 5 runs):
> >>
> >> HEAD: 208.879 ms
> >> Patched: 40.983 ms
> >>
> >> The improvements look promising to me. But in a realistic pipeline the
> >> parse is a small fraction of the work, so end-to-end gains could be
> >> much smaller.
> >>
> >> Feedback is very welcome.
> >>
> > I may be missing something, but I wonder whether the fast path is relying on
> > slightly different input semantics from the existing UUID parser.
> >
> > In particular, hex_decode_safe() is not a strict “32 hex characters only”
> > decoder. It skips whitespace, which is fine for its existing callers, but I
> > don’t think UUID input should treat whitespace inside the UUID body as
> > ignorable.
>
> Good catch! hex_decode_safe() skips whitespaces so the patch accepts
> the following UUID value, which is bad:
>
> select '019f00b5-7f8a-722f-b707-59f0ed25cd '::uuid;
> uuid
> --------------------------------------
> 019f00b5-7f8a-722f-b707-59f0ed25cd00
> (1 row)
>
> > Also, since hex_decode_safe() returns void, the UUID fast path
> > cannot verify that exactly UUID_LEN bytes were produced.
>
> IIUC hex_decode_safe() does return the output length in bytes. So I
> think we can fallback to the scalar UUID parser if
> esctx.error_occurred is true or if the returned value is not 16.
>

You’re right, I misread that part. Checking both esctx.error_occurred and
the returned length sounds good to me.

> >
> > So I think it would be safer either to pre-validate that the 32 source
> > characters are all hex digits before calling hex_decode_safe(), or to use a
> > UUID-specific strict hex decoder for this path. After that, a comment
> > explaining why hex_decode_safe() is safe here would make the invariant much
> > clearer.
>
> IIUC hex_decode_simd_helper() accepts only hex digits so we could
> re-use it for UUID parsing. Let me check if the above idea of using
> the return value works for us first.
>

That sounds reasonable. My main concern was to keep the fast path’s accepted
input set identical to the scalar UUID parser. Falling back when the decoded
length is not UUID_LEN, together with regression tests for whitespace cases,
should address that.

> >
> > Could you also add a few regression tests for invalid inputs that contain
> > whitespace inside otherwise fast-path-looking UUID strings? For example:
> >
> > ---------------------------------------------------------------
> >
> > SELECT 'a0eebc99 9c0b4ef8bb6d6bb9bd380a11'::uuid;
> > SELECT 'a0eebc999c0b4ef8bb6d6bb9bd380a1 '::uuid;
> > SELECT '{a0eebc999c0b4ef8bb6d6bb9bd380a1 }'::uuid;
> > SELECT 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1 '::uuid;
> > ---------------------------------------------------------------
> >
> > These should continue to be rejected in the same way as the scalar parser.
> > Regards,
>
> Agreed.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com

Thanks!

Regards,
Haibo

In response to

Re: Optimize UUID parse using SIMD at 2026-06-25 22:16:00 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tender Wang	2026-06-29 02:31:30	Re: Fix HAVING-to-WHERE pushdown with mismatched operator families
Previous Message	Michael Paquier	2026-06-29 02:11:06	Re: [PATCH] Change wait_time column of pg_stat_lock to double precision