Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8

From: Nazneen Jafri <jafrinazneen(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(at)paquier(dot)xyz>, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>, Radim Marek <radim(at)boringsql(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
Date: 2026-05-27 02:55:14
Message-ID: CA+m5N8s5QGqqxu_re+YFv9PRNrisM7D-Cqbhfj=m8FNZLrovhg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tested Andrey's demo.diff on a fresh environment:

- Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert)

- ~2300 MultiXacts crossing the offsets page boundary

- Without patch: startup deadlocks at RecordNewMultiXact(multi=2047)

- With patch: standby replays all WAL and catches up

Thanks,
Nazneen

On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:

>
>
> > On 26 May 2026, at 17:28, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> > looks correct
>
> I tested that change as follows.
>
> Setted up REL_16_0 as primary, REL_16_STABLE as standby.
>
> Generate multixacts in a single session using savepoints:
>
> BEGIN;
> SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
> -- repeat 2500 times:
> SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
> COMMIT;
>
> Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU
> page
> boundary at multixact 2048 with some spare multis (we'll pickle the excess
> ones in
> jars when all is fixed, toying with 2048 wasted dev cycles for no reason).
>
> Test:
> 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page
> 0->1)
> 1. Take pg_basebackup
> 2. Run the workload again (2500 more, crossing page 1->2)
> 3. Start the standby
>
> I observe:
> Without the change startup deadlocks.
> With the change standby catches up, the DEBUG1 message "next offsets page
> is not
> initialized, initializing it now" confirms the compat block fires
> correctly.
>
> I packaged this test into a buildfarm module (TestReplayXversion) [0] that
> builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces
> the deadlock
> on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject
> regress WAL trace
> into it, not working so far. On a bright side - I managed to get PR number
> 42 in buildfarm
> client repo.
>
>
> Best regards, Andrey Borodin.
>
> [0] https://github.com/PGBuildFarm/client-code/pull/42
>
>
>
>
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tender Wang 2026-05-27 03:50:09 Re: BUG #19493: Assertion failure in pg_plan_advice with EXISTS subquery and DO_NOT_SCAN advice
Previous Message Tender Wang 2026-05-27 01:28:42 Re: BUG #19493: Assertion failure in pg_plan_advice with EXISTS subquery and DO_NOT_SCAN advice