From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | "Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com" <Chiranmoy(dot)Bhattacharya(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Devanga(dot)Susmitha(at)fujitsu(dot)com" <Devanga(dot)Susmitha(at)fujitsu(dot)com>, "Ragesh(dot)Hajela(at)fujitsu(dot)com" <Ragesh(dot)Hajela(at)fujitsu(dot)com> |
Subject: | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date: | 2025-09-24 21:40:49 |
Message-ID: | aNRlYQoxTe8lOIEc@nathan |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 24, 2025 at 10:59:38AM +0700, John Naylor wrote:
> + if (unlikely(!hex_decode_simd_helper(srcv, &dstv1)))
> + break;
>
> But if you really want to do something here, sprinkling "(un)likely"'s
> here seems like solving the wrong problem (even if they make any
> difference), since the early return is optimizing for exceptional
> conditions. In other places (cf. the UTF8 string verifier), we
> accumulate errors, and only if we have them at the end do we restart
> from the beginning with the slow error-checking path that can show the
> user the offending input.
I switched to an accumulator approach in v11.
> +vector8_sssub(const Vector8 v1, const Vector8 v2)
>
> It's hard to parse "sss", so maybe we can borrow an Intel-ism and use
> "iss" for the signed case?
Done.
> +/* vector manipulation */
> +#ifndef USE_NO_SIMD
> +static inline Vector8 vector8_interleave_low(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_interleave_high(const Vector8 v1, const
> Vector8 v2);
> +static inline Vector8 vector8_pack_16(const Vector8 v1, const Vector8 v2);
> +static inline Vector32 vector32_shift_left_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_nibble(const Vector32 v1);
> +static inline Vector32 vector32_shift_right_byte(const Vector32 v1);
>
> Do we need declarations for these? I recall that the existing
> declarations are there for functions that are also used internally.
Removed.
> The nibble/byte things are rather specific. Wouldn't it be more
> logical to expose the already-generic shift operations and let the
> caller say by how much? Or does the compiler refuse because the
> intrinsic doesn't get an immediate value? Some are like that, but I'm
> not sure about these. If so, that's annoying and I wonder if there's a
> workaround.
Yeah, the compiler refuses unless the value is an integer literal. I
thought of using a switch statement to cover all the values used in-tree,
but I didn't like that, either.
> +vector8_has_ge(const Vector8 v, const uint8 c)
> +{
> +#ifdef USE_SSE2
> + Vector8 umax = _mm_max_epu8(v, vector8_broadcast(c));
> + Vector8 cmpe = _mm_cmpeq_epi8(umax, v);
> +
> + return vector8_is_highbit_set(cmpe);
>
> We take pains to avoid signed comparison on unsigned input for the
> "le" case, and I don't see why it's okay here.
_mm_max_epu8() does unsigned comparisons, I think...
> Do the regression tests have long enough cases that test exceptional
> paths, like invalid bytes and embedded whitespace? If not, we need
> some.
Added.
I've also fixed builds on gcc/arm64, as discussed elsewhere [0]. Here are
the current numbers on my laptop:
arm
buf | HEAD | patch | % diff
-------+-------+-------+--------
2 | 4 | 4 | 0
4 | 6 | 6 | 0
8 | 8 | 8 | 0
16 | 11 | 12 | -9
32 | 18 | 5 | 72
64 | 38 | 6 | 84
256 | 134 | 17 | 87
1024 | 513 | 63 | 88
4096 | 2081 | 262 | 87
16384 | 8524 | 1058 | 88
65536 | 34731 | 4224 | 88
[0] https://postgr.es/m/aNQtN89VW8z-yo3B%40nathan
--
nathan
Attachment | Content-Type | Size |
---|---|---|
v11-0001-Optimize-hex_encode-and-hex_decode-using-SIMD.patch | text/plain | 16.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Kim | 2025-09-24 21:50:44 | Re: Proposal for enabling auto-vectorization for checksum calculations |
Previous Message | Melanie Plageman | 2025-09-24 21:33:08 | Re: Incorrect logic in XLogNeedsFlush() |