Quick Links

Re: [PATCH] Hex-coding optimizations using SVE on ARM.

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	John Naylor <johncnaylorls(at)gmail(dot)com>
Cc:	"Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com>
Subject:	Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date:	2025-09-24 21:40:49
Message-ID:	aNRlYQoxTe8lOIEc@nathan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 24, 2025 at 10:59:38AM +0700, John Naylor wrote:
> + if (unlikely(!hex_decode_simd_helper(srcv, &dstv1)))
> + break;
>
> But if you really want to do something here, sprinkling "(un)likely"'s
> here seems like solving the wrong problem (even if they make any
> difference), since the early return is optimizing for exceptional
> conditions. In other places (cf. the UTF8 string verifier), we
> accumulate errors, and only if we have them at the end do we restart
> from the beginning with the slow error-checking path that can show the
> user the offending input.

I switched to an accumulator approach in v11.

> +vector8_sssub(const Vector8 v1, const Vector8 v2)
>
> It's hard to parse "sss", so maybe we can borrow an Intel-ism and use
> "iss" for the signed case?

Done.

> +/* vector manipulation */
> +#ifndef USE_NO_SIMD
> +static inline Vector8 vector8_interleave_low(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_interleave_high(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_pack_16(const Vector8 v1, const Vector8 v2);
> +static inline Vector32 vector32_shift_left_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_byte(const Vector32 v1);
>
> Do we need declarations for these? I recall that the existing
> declarations are there for functions that are also used internally.

Removed.

> The nibble/byte things are rather specific. Wouldn't it be more
> logical to expose the already-generic shift operations and let the
> caller say by how much? Or does the compiler refuse because the
> intrinsic doesn't get an immediate value? Some are like that, but I'm
> not sure about these. If so, that's annoying and I wonder if there's a
> workaround.

Yeah, the compiler refuses unless the value is an integer literal. I
thought of using a switch statement to cover all the values used in-tree,
but I didn't like that, either.

> +vector8_has_ge(const Vector8 v, const uint8 c)
> +{
> +#ifdef USE_SSE2
> + Vector8 umax = _mm_max_epu8(v, vector8_broadcast(c));
> + Vector8 cmpe = _mm_cmpeq_epi8(umax, v);
> +
> + return vector8_is_highbit_set(cmpe);
>
> We take pains to avoid signed comparison on unsigned input for the
> "le" case, and I don't see why it's okay here.

_mm_max_epu8() does unsigned comparisons, I think...

> Do the regression tests have long enough cases that test exceptional
> paths, like invalid bytes and embedded whitespace? If not, we need
> some.

Added.

I've also fixed builds on gcc/arm64, as discussed elsewhere [0]. Here are
the current numbers on my laptop:

arm
buf | HEAD | patch | % diff
-------+-------+-------+--------
2 | 4 | 4 | 0
4 | 6 | 6 | 0
8 | 8 | 8 | 0
16 | 11 | 12 | -9
32 | 18 | 5 | 72
64 | 38 | 6 | 84
256 | 134 | 17 | 87
1024 | 513 | 63 | 88
4096 | 2081 | 262 | 87
16384 | 8524 | 1058 | 88
65536 | 34731 | 4224 | 88

[0] https://postgr.es/m/aNQtN89VW8z-yo3B%40nathan

--
nathan

Attachment	Content-Type	Size
v11-0001-Optimize-hex_encode-and-hex_decode-using-SIMD.patch	text/plain	16.2 KB

In response to

Re: [PATCH] Hex-coding optimizations using SVE on ARM. at 2025-09-24 03:59:38 from John Naylor

Responses

Re: [PATCH] Hex-coding optimizations using SVE on ARM. at 2025-09-25 14:16:35 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Kim	2025-09-24 21:50:44	Re: Proposal for enabling auto-vectorization for checksum calculations
Previous Message	Melanie Plageman	2025-09-24 21:33:08	Re: Incorrect logic in XLogNeedsFlush()