| From: | "Greg Burd" <greg(at)burd(dot)me> |
|---|---|
| To: | "Andres Freund" <andres(at)anarazel(dot)de> |
| Cc: | "Nathan Bossart" <nathandbossart(at)gmail(dot)com>, "John Naylor" <johncnaylorls(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net> |
| Subject: | Re: Add RISC-V Zbb popcount optimization |
| Date: | 2026-05-29 15:44:08 |
| Message-ID: | 0c5901cb-6e58-47f6-aad0-f880e091ab37@app.fastmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Thanks for taking a look at this.
On 2026-05-28 14:17:16 +0000, Andres Freund wrote:
> How confident are we that this bug just affects DES?
It doesn't. I scanned master for the scatter-write idiom that des_init() uses
(`dst[idx[i]] = expr`) and then compiled each candidate on greenfly with both
gcc-13 and clang-20 at -O2 with -march=rv64gcv, diffing the disassembly for
indexed-store instructions. The static scan turned up ~30 sites; clang-20
actually emits RVV scatter/strided stores in three of them, where gcc-13 emits
scalar code:
contrib/pgcrypto/crypt-des.c des_init 3x vsoxei8 (the known case)
src/timezone/zic.c ~L2330 omittype[] 2x vsoxei8 (byte dest, same shape)
contrib/pg_trgm/trgm_op.c compactTrgm 3x vsse8 (strided, not indexed)
The zic.c hit is the one that bothers me. It's the same shape as des_init() --
byte-sized destination, scatter store via an indexing array -- and it's not
exercised by `make check`, since zic runs at `make install` to compile the IANA
tz data. If clang-20 miscompiles zic the same way it miscompiles des_init(),
the buildfarm wouldn't notice and the resulting tzdata would silently be wrong.
I haven't yet built and run that path on greenfly with clang-20 to confirm the
miscompile actually triggers there; I'll do that next.
The trgm_op.c hit is structurally different (strided store, not indexed
scatter), so it isn't the #176001 pattern. I'm flagging it because it's in the
same family of clang-20 RVV transformations on byte destinations, not because I
have a known miscompile for it.
Several other sites that match the source pattern (ginlogic.c MAYBE- twiddle,
spi.c attribute scatter, spgtextproc.c, ...) survive clang-20 codegen
unvectorized -- the autovec heuristics don't fire on the variable-length,
control-flow-dependent loops where this idiom tends to live in our tree. So the
*visible* exposure today appears bounded to crypt-des.c and zic.c, but I would
not bet money that the next clang release won't extend the autovec to those.
One more data point: the scatter stores only appear at -O2 and -O3, not -O1.
Our default is -O2.
> Have you confirmed that, by using a newer clang, the merging of the
> fixes actually fixes the problem?
Not yet. apt.llvm.org doesn't ship riscv64 packages, so I'm building clang from
llvm-project main on greenfly now (currently mid-build, ~24h on this CPU under
nice). Once it's installed I'll rerun the asm audit and the test suite under
it; if HEAD generates correct code for crypt-des.c and zic.c that's a real data
point for "reject affected clang" and gives us a version range for the cutoff.
> ISTM a perfectly viable patch would be to just reject building with a
> non-very-recent clang on riscv.
It's the cleanest answer if HEAD is in fact fixed, and I'd lean that way. The
cost is excluding the clang that ships in current Debian/ Ubuntu stable on
riscv64 (clang 20 in noble / trixie), which is what most riscv64 users actually
have today. I'd rather have the clang-HEAD bisect data before picking the
cutoff -- "reject < clang-N" is much more defensible with N pinned to a specific
fix.
> That seems like a pretty odd fix for the problem. If the problem is
> auto-vectorization, we should stop auto-vectorization, not sprinkle
> memory barriers around.
Agreed. pg_memory_barrier() was the first thing that made the test pass and I
shipped it without going back to the right primitive. The audit makes per-loop
pragmas look worse, not better: the affected sites aren't all in one file, and
one of them (zic.c) is third-party- ish code we mostly don't touch. In rough
order of locality:
1. #pragma clang loop vectorize(disable) on the four affected loops
in des_init() (and the equivalent in zic.c, if confirmed). Doesn't
scale if the next clang release vectorizes a fifth site.
2. A pg_attribute_no_vectorize on des_init() / the affected zic
function. Coarser, defensive against new scatter writes inside
the same function.
3. Configure-time -fno-tree-vectorize on crypt-des.c (and zic.c)
for clang on riscv. File scope. Simple and static.
4. Configure-time -fno-tree-vectorize globally for clang+riscv64
until clang-N, where N is the bisected fix. Or refuse < N.
5. Hard error in configure for clang versions in the affected range.
After the audit my preference shifted from (1) to (3) or (4): the bug isn't
DES-specific, the patch shouldn't be either, and decorating loops one at a time
as we trip over them is exactly the "sprinkle barriers around" complaint applied
a level up. (4) would also fix the secondary intermittent issue we've seen on
greenfly that I haven't been able to attribute to any specific loop.
I'll send v5 once I have:
- clang-HEAD on greenfly and the rerun audit/test results
- confirmation (or refutation) that the zic.c miscompile actually
triggers, not just that the scatter store is emitted
If both confirm what the asm audit suggests, v5 will drop the
pg_memory_barrier() approach in 0001 and replace it with either (3)
or (4)+(5), depending on whether clang-HEAD passes.
The other two patches do improve CRC32 and popcount on RISC-V, they are still
worth considering.
best.
-greg
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-05-29 15:47:06 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Jacob Champion | 2026-05-29 15:43:07 | Re: future of PQfn() |