Re: Add RISC-V Zbb popcount optimization

From: "Greg Burd" <greg(at)burd(dot)me>
To: "Andres Freund" <andres(at)anarazel(dot)de>
Cc: "Nathan Bossart" <nathandbossart(at)gmail(dot)com>, "John Naylor" <johncnaylorls(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Subject: Re: Add RISC-V Zbb popcount optimization
Date: 2026-05-29 15:44:08
Message-ID: 0c5901cb-6e58-47f6-aad0-f880e091ab37@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for taking a look at this.

On 2026-05-28 14:17:16 +0000, Andres Freund wrote:
> How confident are we that this bug just affects DES?

It doesn't. I scanned master for the scatter-write idiom that des_init() uses
(`dst[idx[i]] = expr`) and then compiled each candidate on greenfly with both
gcc-13 and clang-20 at -O2 with -march=rv64gcv, diffing the disassembly for
indexed-store instructions. The static scan turned up ~30 sites; clang-20
actually emits RVV scatter/strided stores in three of them, where gcc-13 emits
scalar code:

contrib/pgcrypto/crypt-des.c des_init 3x vsoxei8 (the known case)
src/timezone/zic.c ~L2330 omittype[] 2x vsoxei8 (byte dest, same shape)
contrib/pg_trgm/trgm_op.c compactTrgm 3x vsse8 (strided, not indexed)

The zic.c hit is the one that bothers me. It's the same shape as des_init() --
byte-sized destination, scatter store via an indexing array -- and it's not
exercised by `make check`, since zic runs at `make install` to compile the IANA
tz data. If clang-20 miscompiles zic the same way it miscompiles des_init(),
the buildfarm wouldn't notice and the resulting tzdata would silently be wrong.
I haven't yet built and run that path on greenfly with clang-20 to confirm the
miscompile actually triggers there; I'll do that next.

The trgm_op.c hit is structurally different (strided store, not indexed
scatter), so it isn't the #176001 pattern. I'm flagging it because it's in the
same family of clang-20 RVV transformations on byte destinations, not because I
have a known miscompile for it.

Several other sites that match the source pattern (ginlogic.c MAYBE- twiddle,
spi.c attribute scatter, spgtextproc.c, ...) survive clang-20 codegen
unvectorized -- the autovec heuristics don't fire on the variable-length,
control-flow-dependent loops where this idiom tends to live in our tree. So the
*visible* exposure today appears bounded to crypt-des.c and zic.c, but I would
not bet money that the next clang release won't extend the autovec to those.

One more data point: the scatter stores only appear at -O2 and -O3, not -O1.
Our default is -O2.

> Have you confirmed that, by using a newer clang, the merging of the
> fixes actually fixes the problem?

Not yet. apt.llvm.org doesn't ship riscv64 packages, so I'm building clang from
llvm-project main on greenfly now (currently mid-build, ~24h on this CPU under
nice). Once it's installed I'll rerun the asm audit and the test suite under
it; if HEAD generates correct code for crypt-des.c and zic.c that's a real data
point for "reject affected clang" and gives us a version range for the cutoff.

> ISTM a perfectly viable patch would be to just reject building with a
> non-very-recent clang on riscv.

It's the cleanest answer if HEAD is in fact fixed, and I'd lean that way. The
cost is excluding the clang that ships in current Debian/ Ubuntu stable on
riscv64 (clang 20 in noble / trixie), which is what most riscv64 users actually
have today. I'd rather have the clang-HEAD bisect data before picking the
cutoff -- "reject < clang-N" is much more defensible with N pinned to a specific
fix.

> That seems like a pretty odd fix for the problem. If the problem is
> auto-vectorization, we should stop auto-vectorization, not sprinkle
> memory barriers around.

Agreed. pg_memory_barrier() was the first thing that made the test pass and I
shipped it without going back to the right primitive. The audit makes per-loop
pragmas look worse, not better: the affected sites aren't all in one file, and
one of them (zic.c) is third-party- ish code we mostly don't touch. In rough
order of locality:

1. #pragma clang loop vectorize(disable) on the four affected loops
in des_init() (and the equivalent in zic.c, if confirmed). Doesn't
scale if the next clang release vectorizes a fifth site.
2. A pg_attribute_no_vectorize on des_init() / the affected zic
function. Coarser, defensive against new scatter writes inside
the same function.
3. Configure-time -fno-tree-vectorize on crypt-des.c (and zic.c)
for clang on riscv. File scope. Simple and static.
4. Configure-time -fno-tree-vectorize globally for clang+riscv64
until clang-N, where N is the bisected fix. Or refuse < N.
5. Hard error in configure for clang versions in the affected range.

After the audit my preference shifted from (1) to (3) or (4): the bug isn't
DES-specific, the patch shouldn't be either, and decorating loops one at a time
as we trip over them is exactly the "sprinkle barriers around" complaint applied
a level up. (4) would also fix the secondary intermittent issue we've seen on
greenfly that I haven't been able to attribute to any specific loop.

I'll send v5 once I have:
- clang-HEAD on greenfly and the rerun audit/test results
- confirmation (or refutation) that the zic.c miscompile actually
triggers, not just that the scatter store is emitted

If both confirm what the asm audit suggests, v5 will drop the
pg_memory_barrier() approach in 0001 and replace it with either (3)
or (4)+(5), depending on whether clang-HEAD passes.

The other two patches do improve CRC32 and popcount on RISC-V, they are still
worth considering.

best.

-greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-05-29 15:47:06 Re: Heads Up: cirrus-ci is shutting down June 1st
Previous Message Jacob Champion 2026-05-29 15:43:07 Re: future of PQfn()