Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Sebastian Webber <sebastian(at)swebber(dot)me>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Andrey Borodin <amborodin(at)acm(dot)org>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Dmitry Yurichev <dsy(dot)075(at)yandex(dot)ru>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Ivan Bykov <i(dot)bykov(at)modernsys(dot)ru>
Subject: Re: 17.8 standby crashes during WAL replay from 17.5 primary: "could not access status of transaction"
Date: 2026-02-18 08:58:03
Message-ID: 3EA622D2-635C-4C3E-9B64-90D1BAFD5C11@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On 16 Feb 2026, at 21:01, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Andrey if you can verify with your TAP test, too, that'd be great.

Here's a hand-wavy test on top of REL_17_STABLE. It modifies binaries to simulate old WAL write behavior.
I tried to hack it with -DDEMO_SIMULATE_OLD_MULTIXACT_BEHAVIOR, but gave up and just hardcoded.
We are not going to commit it, aren't we?

If we comment out this line (patch does it)

pg_atomic_write_u64(&MultiXactOffsetCtl->shared->latest_page_number,
pageno);

the test will pass.

Either way it will hang indefinitely because

2026-02-18 13:44:12.238 +05 [52360] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
2026-02-18 13:44:12.250 +05 [52359] FATAL: could not access status of transaction 4096
2026-02-18 13:44:12.250 +05 [52359] DETAIL: Could not read from file "pg_multixact/offsets/0000" at offset 16384: read too few bytes.
2026-02-18 13:44:12.250 +05 [52359] CONTEXT: WAL redo at 0/30245E0 for MultiXact/CREATE_ID: 4095 offset 8189 nmembers 2: 4835 (sh) 4835 (upd)

Most hand-wavy part is test_multixact_write_truncate_wal(): truncation is synthetic.

FWIW, a lot of calculations and commenting done by LLM. Let me know if such a verbosity is not good for readability.

Best regards, Andrey Borodin.

Attachment Content-Type Size
0001-Test-Multixact-truncation-near-page-boundary-replay-.patch application/octet-stream 13.6 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2026-02-18 09:50:34 Re: BUG #19412: Wrong query result with not null constraint
Previous Message Michael Paquier 2026-02-18 08:19:16 Re: BUG #19095: Test if function exit() is used fail when linked static