| From: | Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build |
| Date: | 2025-06-11 18:10:31 |
| Message-ID: | CAE-ML+-OV6p9uvCFBcSQjZUEh__y0h-KjN+BseyGJHt7u8EP+w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hello,
I noticed a strange issue that I can only reproduce with clang:
clang version 19.1.7 (RESF 19.1.7-2.module+el8.10.0+1965+112b558b) on
the devel branch (SHA: 137935bd1167a94b0bfea7239033f1ba1a1d95bb).
We are getting a control file checksum mismatch during initdb. I added
some prints in a small debug patch, and recorded the postgres process
using rr. I have uploaded the rr archive (made with rr pack, tar-ed up) [1].
$ initdb -n -d -D /usr/local/pgsql/data &> initdb.out
2025-06-11 16:19:54.343 UTC [3070] LOG: WriteControlFile crc = 3457434907,
algo is avx = 1
...
2025-06-11 16:19:54.343 UTC [3070] LOG: ReadControlFile crc = 3457434907,
ControlFile->crc = 3457434907, algo is avx = 1
...
2025-06-11 16:19:54.346 UTC [3070] LOG: update_controlfile crc =
2065009488, algo is avx = 1
...
2025-06-11 16:20:13.914 UTC [3070] LOG: update_controlfile crc =
3406554082, algo is avx = 1
...
2025-06-11 16:20:13.920 UTC [3070] LOG: update_controlfile crc =
1234673735, algo is avx = 1
...
2025-06-11 16:20:13.923 UTC [3070] NOTICE: database system is shut down
...
2025-06-11 16:20:13.923 UTC [3070] DEBUG: proc_exit(-1): 0 callbacks to
make
ok
performing post-bootstrap initialization ... 2025-06-11 16:20:13.984 UTC
[3072] LOG: ReadControlFile crc = 2925279607, ControlFile->crc =
1234673735, algo is avx = 1
2025-06-11 16:20:13.984 UTC [3072] FATAL: incorrect checksum in control
file
child process exited with exit code 1
initdb: data directory "/usr/local/pgsql/data" not removed at user's request
Note that this only reproduces with clang-19 -O0 and NOT -O3. I haven't
tried with other versions of clang.
OTOH, gcc-14 is cool with both -O0 and -O3, with AVX-512 getting picked
for both cases, for CRC instructions.
Environment:
(1) Configure options:
./configure --prefix=/usr/local/pgsql --with-python --enable-depend
--without-icu --enable-debug CFLAGS='-O0 -fno-omit-frame-pointer' CC=clang
(2) Config log shows:
configure:18262: checking for vectorized CRC-32C
configure:18268: result: AVX-512 with runtime check
pgac_cv_avx512_pclmul_intrinsics=yes
pgac_cv_xsave_intrinsics=yes
(3) Confirmation that we have AVX-512 CRC getting selected at runtime:
(rr) f
#0 WriteControlFile () at xlog.c:4386
4386 ControlFile->pg_control_version = PG_CONTROL_VERSION;
(rr) p pg_comp_crc32c
$1 = (pg_crc32c (*)(pg_crc32c, const void *, size_t)) 0xc8b8b0
<pg_comp_crc32c_avx512>
(4) This is running in a VM with:
Rocky Linux release 8.10 (Green Obsidian)
16 vCPUs
Hypervisor: VMware ESXi, 8.0.3, 24022510
Model: PowerEdge R650
Processor Type: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
vCenter:
Version: 8.0.3
Build: 24322831
I attempted both vSAN and local storage. That didn't influence matters.
There is a known vSAN bug with invalid checksums + AVX-512, but that has
been fixed in an older version [2] (and besides the issue reproes with
local storage too).
Please let me know if there is any other info I can provide.
Regards,
Deep (VMware)
[1]
https://drive.google.com/file/d/15bGN_NlGsYx0lJCMGnCxDnV2avYkujgk/view?usp=sharing
[2]
https://knowledge.broadcom.com/external/article/367589/applications-using-avx512-instructions-i.html
| Attachment | Content-Type | Size |
|---|---|---|
| pg_control | application/octet-stream | 8.0 KB |
| config.log | text/x-log | 359.5 KB |
| cpuid.out | application/octet-stream | 557.2 KB |
| initdb.out.tar.gz | application/gzip | 1.2 MB |
| debug.patch | text/x-patch | 2.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2025-06-11 21:42:19 | Re: Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build |
| Previous Message | Anthonin Bonnefoy | 2025-06-11 14:11:14 | Re: BUG #18944: Assertion Failure in psql with idle_session_timeout Set |