Re: IPC/MultixactCreation on the Standby server

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
Cc: Dmitry <dsy(dot)075(at)yandex(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: IPC/MultixactCreation on the Standby server
Date: 2025-07-27 11:53:45
Message-ID: A638A5AC-B5F7-4E3E-BD07-EC0D56A14B41@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 26 Jul 2025, at 22:44, Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
>
> On 2025-Jul-25, Andrey Borodin wrote:
>
>> Also I've discovered one more serious problem.
>> If a backend crashes just before WAL-logging multi, any heap tuple
>> that uses this multi will become unreadable. Any attempt to read it
>> will hang forever.
>>
>> I've reproduced the problem and now I'm working on scripting this
>> scenario. Basically, I modify code to hang forever after assigning
>> multi number 2.
>
> It took me a minute to understand this, and I think your description is
> slightly incorrect: you mean that the heap tuple that uses the PREVIOUS
> multixact cannot be read (at least, that's what I understand from your
> reproducer script).
Yes, I explained a bit incorrectly, but you got the problem correctly.

>
> Looking at this,
>
> /*
> * We want to avoid edge case 2 in redo, because we cannot wait for
> * startup process in GetMultiXactIdMembers() without risk of a
> * deadlock.
> */
> MultiXactId next = multi + 1;
> int next_pageno;
>
> /* Handle wraparound as GetMultiXactIdMembers() does it. */
> if (multi < FirstMultiXactId)
> multi = FirstMultiXactId;
>
> Don't you mean to test and change the value 'next' rather than 'multi'
> here?
Yup, that was typo.

>
> In this bit,
>
> * We do not need to handle race conditions, because this code
> * is only executed in redo and we hold
> * MultiXactOffsetSLRULock.
>
> I think it'd be good to have an
> Assert(LWLockHeldByMeInMode(MultiXactOffsetSLRULock, LW_EXCLUSIVE));
> just for peace of mind.
Ugh, that uncovered 17+ problem: now we have a couple of locks simultaneously. I'll post a version with this a later.

> Also, commit c61678551699 removed
> ZeroMultiXactOffsetPage(), but since you have 'false' as the second
> argument, then SimpleLruZeroPage() is enough. (I wondered why isn't
> WAL-logging necessary ... until I remember that we're in a standby. I
> think a simple comment here like "no WAL-logging because we're a
> standby" should suffice.)

Agree.

I've made a test [0] and discovered another problem. Adding this checkpoint breaks the test[1] even after a fix[2].
I suspect that excluding "edge case 2" on standby is simply not enough... we have to do this "next offset" dance on Primary too. I'll think more about other options.

Best regards, Andrey Borodin.
[0] https://github.com/x4m/postgres_g/commit/eafcaec7aafde064b0da5d2ba4041ed2fb134f07
[1] https://github.com/x4m/postgres_g/commit/da762c7cac56eff1988ea9126171ca0a6d2665e9
[2] https://github.com/x4m/postgres_g/commit/d64c17d697d082856e5fe8bd52abafc0585973af

Timeline of this commits can be seen here https://github.com/x4m/postgres_g/commits/mx19/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2025-07-27 13:42:53 Re: Returning nbtree posting list TIDs in DESC order during backwards scans
Previous Message Frédéric Yhuel 2025-07-27 10:46:44 Re: vacuumdb changes for stats import/export