Failure in test_slru for host gokiburi (REL_16_STABLE only)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>
Subject: Failure in test_slru for host gokiburi (REL_16_STABLE only)
Date: 2026-05-18 11:41:45
Message-ID: agr6-cIQ4EUA86Cs@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

gokiburi has been failing on only REL_16_STABLE for the last few days,
for the tests of module test_slru. First failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2026-05-13%2012%3A20%3A45

Set of changes associated with the first failure, which seem
completely innocent to me:
5f12d86dd76 Wed May 13 05:43:49 2026 UTC Add more tests for
corrupted data with pglz_decompress()
d140237dab8 Wed May 13 02:46:17 2026 UTC Fix stale COPY progress
during logical replication table sync

While the buildfarm runs don't show much, I have been able to
reproduce the failure on the buildfarm host, after using
-DEXEC_BACKEND. Here is a backtrace, pointing out that something is
broken with LWLock initialization:
2026-05-18 05:20:50.186 UTC client backend[870830]
pg_regress/test_slru STATEMENT: SELECT
test_slru_page_readonly(12377); TRAP: failed
Assert("LWLockHeldByMe(TestSLRULock)"), File: "test_slru.c", Line:
124, PID: 870830
postgres: popo contrib_regression [local]
SELECT(ExceptionalCondition+0x16c) [0xaaaaabcf4d88]
/home/popo/lib/test_slru.so(test_slru_page_readonly+0xe4)
[0xffffedf83060]
postgres: popo contrib_regression [local] SELECT(+0x885c40) [0xaaaaab325c40]
postgres: popo contrib_regression [local] SELECT(ExecInterpExprStillValid+0x84) [0xaaaaab329a4c]
postgres: popo contrib_regression [local] SELECT(+0x9405fc) [0xaaaaab3e05fc]
postgres: popo contrib_regression [local] SELECT(+0x9406d4) [0xaaaaab3e06d4]
postgres: popo contrib_regression [local] SELECT(+0x940b34) [0xaaaaab3e0b34]
postgres: popo contrib_regression [local] SELECT(+0x8b7ac0) [0xaaaaab357ac0]
postgres: popo contrib_regression [local] SELECT(+0x89de14) [0xaaaaab33de14]
postgres: popo contrib_regression [local] SELECT(+0x8a46c0) [0xaaaaab3446c0]
postgres: popo contrib_regression [local] SELECT(standard_ExecutorRun+0x2d0) [0xaaaaab33ec68]
postgres: popo contrib_regression [local] SELECT(ExecutorRun+0xb8) [0xaaaaab33e970]
postgres: popo contrib_regression [local] SELECT(+0xe550dc) [0xaaaaab8f50dc]
postgres: popo contrib_regression [local] SELECT(PortalRun+0x460) [0xaaaaab8f4958]
postgres: popo contrib_regression [local] SELECT(+0xe43150) [0xaaaaab8e3150]
postgres: popo contrib_regression [local] SELECT(PostgresMain+0x15e8) [0xaaaaab8f0560]
postgres: popo contrib_regression [local] SELECT(postmaster_forkexec+0x0) [0xaaaaab70f644]
postgres: popo contrib_regression [local] SELECT(SubPostmasterMain+0x6fc) [0xaaaaab7106d8]
postgres: popo contrib_regression [local] SELECT(main+0x6d0)
[0xaaaaab463f6c] /lib/aarch64-linux-gnu/libc.so.6(+0x2225c)
[0xfffff725225c]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x9c)
[0xfffff725233c]
postgres: popo contrib_regression [local] SELECT(_start+0x30) [0xaaaaaad3d4b0]

The server logs include the following, pointing to a broken state
(these two should not fail):
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
ERROR: lock <unassigned:0> is not held
2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru
STATEMENT: SELECT test_slru_page_write(12345, 'Test SLRU');

Note that the tests pass without -DEXEC_BACKEND.

While reading through the module, I think that the LWLock
initialization logic is borked, where we decide to do a
LWLockInitialize() more times than necessary, confusing the internal
states. Honestly, I have no clue why the test has suddenly been
failing, and why other buildfarm members don't complain. The host has
been upgraded a couple of days ago to the latest Debian, but I also
had a few clean runs in the buildfarm before this began showing up.
What I do know is that the patch attached is able to make the tests of
the module pass for v16 on the problematic host with -DEXEC_BACKEND.

Comments or opinions?
--
Michael

Attachment Content-Type Size
0001-test_slru-Fix-LWLock-allocation-logic.patch text/plain 2.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2026-05-18 11:43:32 Re: Support LIKE with nondeterministic collations
Previous Message Thom Brown 2026-05-18 11:04:18 Re: [PATCH] Fix psql tab completion for REPACK boolean options