| From: | Nazneen Jafri <jafrinazneen(at)gmail(dot)com> |
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(at)paquier(dot)xyz>, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>, Radim Marek <radim(at)boringsql(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 |
| Date: | 2026-05-27 02:55:14 |
| Message-ID: | CA+m5N8s5QGqqxu_re+YFv9PRNrisM7D-Cqbhfj=m8FNZLrovhg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Tested Andrey's demo.diff on a fresh environment:
- Primary: REL_16_8, Standby: REL_16_14 (--enable-cassert)
- ~2300 MultiXacts crossing the offsets page boundary
- Without patch: startup deadlocks at RecordNewMultiXact(multi=2047)
- With patch: standby replays all WAL and catches up
Thanks,
Nazneen
On Tue, May 26, 2026 at 2:55 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
> > On 26 May 2026, at 17:28, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> > looks correct
>
> I tested that change as follows.
>
> Setted up REL_16_0 as primary, REL_16_STABLE as standby.
>
> Generate multixacts in a single session using savepoints:
>
> BEGIN;
> SELECT * FROM t WHERE i = 1 FOR NO KEY UPDATE;
> -- repeat 2500 times:
> SAVEPOINT a; SELECT * FROM t WHERE i = 1 FOR UPDATE; ROLLBACK TO a;
> COMMIT;
>
> Each iteration creates a new MultiXactId. 2500 iterations cross the SLRU
> page
> boundary at multixact 2048 with some spare multis (we'll pickle the excess
> ones in
> jars when all is fixed, toying with 2048 wasted dev cycles for no reason).
>
> Test:
> 0. Run the workload on REL_16_0 primary (2500 multixacts, crossing page
> 0->1)
> 1. Take pg_basebackup
> 2. Run the workload again (2500 more, crossing page 1->2)
> 3. Start the standby
>
> I observe:
> Without the change startup deadlocks.
> With the change standby catches up, the DEBUG1 message "next offsets page
> is not
> initialized, initializing it now" confirms the compat block fires
> correctly.
>
> I packaged this test into a buildfarm module (TestReplayXversion) [0] that
> builds REL_x_0 and runs this check on REL_x_STABLE build. It reproduces
> the deadlock
> on 14, 15, and 16; 17 and 18 pass. Currently I'm struggling to inject
> regress WAL trace
> into it, not working so far. On a bright side - I managed to get PR number
> 42 in buildfarm
> client repo.
>
>
> Best regards, Andrey Borodin.
>
> [0] https://github.com/PGBuildFarm/client-code/pull/42
>
>
>
>
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tender Wang | 2026-05-27 03:50:09 | Re: BUG #19493: Assertion failure in pg_plan_advice with EXISTS subquery and DO_NOT_SCAN advice |
| Previous Message | Tender Wang | 2026-05-27 01:28:42 | Re: BUG #19493: Assertion failure in pg_plan_advice with EXISTS subquery and DO_NOT_SCAN advice |