Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build

From: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build
Date: 2025-06-11 18:10:31
Message-ID: CAE-ML+-OV6p9uvCFBcSQjZUEh__y0h-KjN+BseyGJHt7u8EP+w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

I noticed a strange issue that I can only reproduce with clang:
clang version 19.1.7 (RESF 19.1.7-2.module+el8.10.0+1965+112b558b) on
the devel branch (SHA: 137935bd1167a94b0bfea7239033f1ba1a1d95bb).

We are getting a control file checksum mismatch during initdb. I added
some prints in a small debug patch, and recorded the postgres process
using rr. I have uploaded the rr archive (made with rr pack, tar-ed up) [1].

$ initdb -n -d -D /usr/local/pgsql/data &> initdb.out

2025-06-11 16:19:54.343 UTC [3070] LOG: WriteControlFile crc = 3457434907,
algo is avx = 1
...
2025-06-11 16:19:54.343 UTC [3070] LOG: ReadControlFile crc = 3457434907,
ControlFile->crc = 3457434907, algo is avx = 1
...
2025-06-11 16:19:54.346 UTC [3070] LOG: update_controlfile crc =
2065009488, algo is avx = 1
...
2025-06-11 16:20:13.914 UTC [3070] LOG: update_controlfile crc =
3406554082, algo is avx = 1
...
2025-06-11 16:20:13.920 UTC [3070] LOG: update_controlfile crc =
1234673735, algo is avx = 1
...
2025-06-11 16:20:13.923 UTC [3070] NOTICE: database system is shut down
...
2025-06-11 16:20:13.923 UTC [3070] DEBUG: proc_exit(-1): 0 callbacks to
make
ok
performing post-bootstrap initialization ... 2025-06-11 16:20:13.984 UTC
[3072] LOG: ReadControlFile crc = 2925279607, ControlFile->crc =
1234673735, algo is avx = 1
2025-06-11 16:20:13.984 UTC [3072] FATAL: incorrect checksum in control
file
child process exited with exit code 1
initdb: data directory "/usr/local/pgsql/data" not removed at user's request

Note that this only reproduces with clang-19 -O0 and NOT -O3. I haven't
tried with other versions of clang.

OTOH, gcc-14 is cool with both -O0 and -O3, with AVX-512 getting picked
for both cases, for CRC instructions.

Environment:

(1) Configure options:
./configure --prefix=/usr/local/pgsql --with-python --enable-depend
--without-icu --enable-debug CFLAGS='-O0 -fno-omit-frame-pointer' CC=clang

(2) Config log shows:
configure:18262: checking for vectorized CRC-32C
configure:18268: result: AVX-512 with runtime check

pgac_cv_avx512_pclmul_intrinsics=yes
pgac_cv_xsave_intrinsics=yes

(3) Confirmation that we have AVX-512 CRC getting selected at runtime:
(rr) f
#0 WriteControlFile () at xlog.c:4386
4386 ControlFile->pg_control_version = PG_CONTROL_VERSION;
(rr) p pg_comp_crc32c
$1 = (pg_crc32c (*)(pg_crc32c, const void *, size_t)) 0xc8b8b0
<pg_comp_crc32c_avx512>

(4) This is running in a VM with:
Rocky Linux release 8.10 (Green Obsidian)
16 vCPUs
Hypervisor: VMware ESXi, 8.0.3, 24022510
Model: PowerEdge R650
Processor Type: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
vCenter:
Version: 8.0.3
Build: 24322831

I attempted both vSAN and local storage. That didn't influence matters.
There is a known vSAN bug with invalid checksums + AVX-512, but that has
been fixed in an older version [2] (and besides the issue reproes with
local storage too).

Please let me know if there is any other info I can provide.

Regards,
Deep (VMware)

[1]
https://drive.google.com/file/d/15bGN_NlGsYx0lJCMGnCxDnV2avYkujgk/view?usp=sharing
[2]
https://knowledge.broadcom.com/external/article/367589/applications-using-avx512-instructions-i.html

Attachment Content-Type Size
pg_control application/octet-stream 8.0 KB
config.log text/x-log 359.5 KB
cpuid.out application/octet-stream 557.2 KB
initdb.out.tar.gz application/gzip 1.2 MB
debug.patch text/x-patch 2.2 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Nathan Bossart 2025-06-11 21:42:19 Re: Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build
Previous Message Anthonin Bonnefoy 2025-06-11 14:11:14 Re: BUG #18944: Assertion Failure in psql with idle_session_timeout Set