| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | greenfly lwlock corruption in REL_14_STABLE and REL_15_STABLE |
| Date: | 2025-12-10 05:10:20 |
| Message-ID: | CA+hUKGKOEzLyYWS4yAsrSi2shBZLs9hEfdzXi004tzKEO_JA4Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Beginning a week ago, greenfly (RISC-V, Clang 20.1) has failed like
this in 5 of 8 runs of the pgbench tests on the two oldest branches:
TRAP: FailedAssertion("!(oldstate & LW_VAL_EXCLUSIVE)", File:
"lwlock.c", Line: 1850, PID: 1536294)
postgres: main: gburd postgres [local] CREATE
TYPE(ExceptionalCondition+0x72)[0x2ad1326922]
postgres: main: gburd postgres [local] CREATE
TYPE(LWLockRelease+0x51e)[0x2ad1634e60]
postgres: main: gburd postgres [local] CREATE
TYPE(_bt_first+0x7f8)[0x2ad139c314]
postgres: main: gburd postgres [local] CREATE
TYPE(btgettuple+0xca)[0x2ad13996f8]
postgres: main: gburd postgres [local] CREATE
TYPE(index_getnext_tid+0x2a)[0x2ad138bd66]
postgres: main: gburd postgres [local] CREATE
TYPE(index_getnext_slot+0x24)[0x2ad138bf56]
postgres: main: gburd postgres [local] CREATE
TYPE(systable_getnext+0x18)[0x2ad138a97c]
postgres: main: gburd postgres [local] CREATE
TYPE(GetNewOidWithIndex+0xfc)[0x2ad13ed284]
postgres: main: gburd postgres [local] CREATE
TYPE(EnumValuesCreate+0x58)[0x2ad14090ec]
postgres: main: gburd postgres [local] CREATE
TYPE(DefineEnum+0x10a)[0x2ad14bb948]
postgres: main: gburd postgres [local] CREATE TYPE(+0x3f0336)[0x2ad164a336]
postgres: main: gburd postgres [local] CREATE
TYPE(standard_ProcessUtility+0x468)[0x2ad1649560]
postgres: main: gburd postgres [local] CREATE TYPE(+0x3eec0e)[0x2ad1648c0e]
postgres: main: gburd postgres [local] CREATE TYPE(+0x3ee418)[0x2ad1648418]
postgres: main: gburd postgres [local] CREATE
TYPE(PortalRun+0x160)[0x2ad1647ec8]
postgres: main: gburd postgres [local] CREATE
TYPE(PostgresMain+0x1b34)[0x2ad1646000]
postgres: main: gburd postgres [local] CREATE TYPE(+0x36205a)[0x2ad15bc05a]
postgres: main: gburd postgres [local] CREATE
TYPE(ClosePostmasterPorts+0x0)[0x2ad15bb8e0]
postgres: main: gburd postgres [local] CREATE
TYPE(PostmasterMain+0x100a)[0x2ad15b92ac]
postgres: main: gburd postgres [local] CREATE TYPE(+0x2cac90)[0x2ad1524c90]
/lib/riscv64-linux-gnu/libc.so.6(+0x277cc)[0x3f9caa77cc]
/lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x78)[0x3f9caa7878]
postgres: main: gburd postgres [local] CREATE TYPE(_start+0x20)[0x2ad1326ac0]
That's:
if (mode == LW_EXCLUSIVE)
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_EXCLUSIVE);
else
oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
/* nobody else can have that kind of lock */
Assert(!(oldstate & LW_VAL_EXCLUSIVE));
I will see if I can reproduce it or see something wrong under qemu,
but that'll take some time to set up...
Since the RISC-V GCC animals aren't showing any problem, I wondered if
this could be related to commits d8ba910b, 1c7cba4, but that was ~30
days ago, applied to all branches and prevented reordering of
non-atomic loads, while here I assume we have __sync_fetch_and_sub()
without a connection to other memory as far as I can see immediately.
Commits 332693e7, da39714 touched lwlock.c ~15 days ago, but not in a
way that immediately seems relevant; if there were a relevant flag
protocol difference in these branches, then why only this system? It
also passed half a dozen times before the cluster of failures. That
seems to point back towards codegen problems, but perhaps of a
different kind. Unless something else is going really wrong, but it's
hard to imagine that we forgot which lock type we held...
date | branch | commit | assert_failed
------------+---------------+---------------------------------+---------------
2025-12-09 | REL_15_STABLE | f188bc5 doc: Fix statement a... |
2025-12-09 | REL_14_STABLE | 4c4fa53 doc: Fix statement a... | t
2025-12-09 | REL_15_STABLE | 52a9588 Doc: fix typo in has... | t
2025-12-05 | REL_15_STABLE | b9a02b9 Fix setting next mul... |
2025-12-05 | REL_14_STABLE | 4896955 Fix setting next mul... |
2025-12-05 | REL_15_STABLE | 7e54eac Show version of node... | t
2025-12-03 | REL_15_STABLE | 8cfb174 Set next multixid's ... | t
2025-12-03 | REL_14_STABLE | 81416e1 Set next multixid's ... | t
2025-12-02 | REL_15_STABLE | 7792bdc Fix amcheck's handli... |
2025-12-02 | REL_14_STABLE | fbb4b60 Fix amcheck's handli... |
2025-11-29 | REL_15_STABLE | 134a8ee Avoid rewriting data... |
2025-11-29 | REL_14_STABLE | 2d5b97b Avoid rewriting data... |
2025-11-27 | REL_15_STABLE | f19502f Allow indexscans on ... |
2025-11-27 | REL_14_STABLE | 9e77323 Allow indexscans on ... |
2025-11-27 | REL_15_STABLE | f9f9283 doc: Fix misleading ... |
2025-11-26 | REL_15_STABLE | eb7743e doc: Clarify passphr... |
2025-11-26 | REL_14_STABLE | 9a26ff8 doc: Clarify passphr... |
2025-11-25 | REL_15_STABLE | da39714 lwlock: Fix, current... |
2025-11-25 | REL_14_STABLE | 332693e lwlock: Fix, current... |
2025-11-24 | REL_15_STABLE | ea757e8 Fix incorrect IndexO... |
2025-11-24 | REL_14_STABLE | ea36c2f Fix incorrect IndexO... |
2025-11-22 | REL_15_STABLE | 5516485 jit: Adjust AArch64-... |
2025-11-22 | REL_14_STABLE | 035a1f5 jit: Adjust AArch64-... |
2025-11-19 | REL_15_STABLE | 7c49407 Print new OldestXID ... |
2025-11-19 | REL_14_STABLE | 11cc0f4 Print new OldestXID ... |
2025-11-18 | REL_15_STABLE | 9f5a58a Don't allow CTEs to ... |
2025-11-18 | REL_14_STABLE | b853974 Don't allow CTEs to ... |
2025-11-18 | REL_15_STABLE | 3995e4a Define PS_USE_CLOBBE... |
2025-11-18 | REL_14_STABLE | 29a3e22 Define PS_USE_CLOBBE... |
2025-11-17 | REL_15_STABLE | ad5cc3a Update .abi-complian... |
2025-11-16 | REL_15_STABLE | 5d5b05c Doc: include MERGE i... |
2025-11-14 | REL_15_STABLE | d61af52 Add note about Creat... |
2025-11-14 | REL_14_STABLE | 4c179cc Add note about Creat... |
2025-11-13 | REL_15_STABLE | c663152 doc: Improve descrip... |
2025-11-13 | REL_14_STABLE | 7aa83ea doc: Improve descrip... |
2025-11-12 | REL_15_STABLE | 21a9014 Clear 'xid' in dummy... |
2025-11-12 | REL_14_STABLE | 84f1bf4 Clear 'xid' in dummy... |
2025-11-12 | REL_14_STABLE | 4ef048f doc: Document effect... |
2025-11-12 | REL_15_STABLE | 608566b doc: Document effect... |
2025-11-12 | REL_14_STABLE | f8a0ea8 Fix range for commit... |
2025-11-12 | REL_15_STABLE | 97cd4b6 Fix pg_upgrade aroun... |
2025-11-12 | REL_15_STABLE | 74b26c8 doc: Fix incorrect s... |
2025-11-11 | REL_15_STABLE | 32f3881 Stamp 15.15.... |
2025-11-11 | REL_14_STABLE | 9ad034b Stamp 14.20.... |
2025-11-10 | REL_15_STABLE | 70d03b5 Last-minute updates ... |
2025-11-10 | REL_14_STABLE | ee953cd Last-minute updates ... |
2025-11-10 | REL_15_STABLE | 9142156 libpq: Prevent some ... |
2025-11-10 | REL_14_STABLE | e792be6 Translation updates... |
2025-11-09 | REL_15_STABLE | e334e80 Release notes for 18... |
2025-11-09 | REL_14_STABLE | 06827c5 Release notes for 18... |
2025-11-08 | REL_15_STABLE | 1c7cba4 Fix generic read and... |
2025-11-08 | REL_14_STABLE | d8ba910 Fix generic read and... |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Ajin Cherian | 2025-12-10 05:07:34 | Re: Improve pg_sync_replication_slots() to wait for primary to advance |