From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
Cc: | Dmitry <dsy(dot)075(at)yandex(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: IPC/MultixactCreation on the Standby server |
Date: | 2025-07-27 11:53:45 |
Message-ID: | A638A5AC-B5F7-4E3E-BD07-EC0D56A14B41@yandex-team.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 26 Jul 2025, at 22:44, Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
>
> On 2025-Jul-25, Andrey Borodin wrote:
>
>> Also I've discovered one more serious problem.
>> If a backend crashes just before WAL-logging multi, any heap tuple
>> that uses this multi will become unreadable. Any attempt to read it
>> will hang forever.
>>
>> I've reproduced the problem and now I'm working on scripting this
>> scenario. Basically, I modify code to hang forever after assigning
>> multi number 2.
>
> It took me a minute to understand this, and I think your description is
> slightly incorrect: you mean that the heap tuple that uses the PREVIOUS
> multixact cannot be read (at least, that's what I understand from your
> reproducer script).
Yes, I explained a bit incorrectly, but you got the problem correctly.
>
> Looking at this,
>
> /*
> * We want to avoid edge case 2 in redo, because we cannot wait for
> * startup process in GetMultiXactIdMembers() without risk of a
> * deadlock.
> */
> MultiXactId next = multi + 1;
> int next_pageno;
>
> /* Handle wraparound as GetMultiXactIdMembers() does it. */
> if (multi < FirstMultiXactId)
> multi = FirstMultiXactId;
>
> Don't you mean to test and change the value 'next' rather than 'multi'
> here?
Yup, that was typo.
>
> In this bit,
>
> * We do not need to handle race conditions, because this code
> * is only executed in redo and we hold
> * MultiXactOffsetSLRULock.
>
> I think it'd be good to have an
> Assert(LWLockHeldByMeInMode(MultiXactOffsetSLRULock, LW_EXCLUSIVE));
> just for peace of mind.
Ugh, that uncovered 17+ problem: now we have a couple of locks simultaneously. I'll post a version with this a later.
> Also, commit c61678551699 removed
> ZeroMultiXactOffsetPage(), but since you have 'false' as the second
> argument, then SimpleLruZeroPage() is enough. (I wondered why isn't
> WAL-logging necessary ... until I remember that we're in a standby. I
> think a simple comment here like "no WAL-logging because we're a
> standby" should suffice.)
Agree.
I've made a test [0] and discovered another problem. Adding this checkpoint breaks the test[1] even after a fix[2].
I suspect that excluding "edge case 2" on standby is simply not enough... we have to do this "next offset" dance on Primary too. I'll think more about other options.
Best regards, Andrey Borodin.
[0] https://github.com/x4m/postgres_g/commit/eafcaec7aafde064b0da5d2ba4041ed2fb134f07
[1] https://github.com/x4m/postgres_g/commit/da762c7cac56eff1988ea9126171ca0a6d2665e9
[2] https://github.com/x4m/postgres_g/commit/d64c17d697d082856e5fe8bd52abafc0585973af
Timeline of this commits can be seen here https://github.com/x4m/postgres_g/commits/mx19/
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2025-07-27 13:42:53 | Re: Returning nbtree posting list TIDs in DESC order during backwards scans |
Previous Message | Frédéric Yhuel | 2025-07-27 10:46:44 | Re: vacuumdb changes for stats import/export |