Re: IPC/MultixactCreation on the Standby server

From: Ivan Bykov <i(dot)bykov(at)modernsys(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: IPC/MultixactCreation on the Standby server
Date: 2025-11-24 14:32:26
Message-ID: 176399474623.1015.6638892143305450074.pgcf@coridan.postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Andrey!

> Thanks for your review!

Review still in progress, sorry for the delay. I didn't have enough time to fully understand the changes you suggest, but it seems there is
only a small gap in my understanding of what the patch does. Here is my explanation of the problem.

The main problem
=============

The main problem is that we may lose session context before writing the offset to SLRU (but we may write
a WAL record). It seems that the writer process got stuck in the XLogInsert procedure or even failed between GetNewMultiXactId 
and RecordNewMultiXact call. In this case, readers will wait to receive a conditional variable signal (from new multixacts)
but could not find a valid offset for the "failed" (it may be in WAL) multixid.

I illustrate this using the next diagram.

Writer Reader
--------------------------------------------------------------------------------
MultiXactIdCreateFromMembers
-> GetNewMultiXactId (101)
GetMultiXactIdMembers(100)
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
offset 101 == 0
-> ConditionVariableSleep()
+--------------------------------------------------------------------------------------+
|-> XLogInsert |
+--------------------------------------------------------------------------------------+
-> RecordNewMultiXact
-> LWLockAcquire(MultiXactOffset)
-> write offset 101
-> LWLockRelease(MultiXactOffset)
-> ConditionVariableBroadcast(nextoff_cv);
-> retry:
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
offset 101 != 0
-> length = offset 101 - read offset 100
-> LWLockAcquire(MultiXactMember)
-> write members 101
-> LWLockRelease(MultiXactOffset)

--------------------------------------------------------------------------------------

As far as I can see, your proposal seems to address exactly that problem.
The main difference from the former solution is writing to MultiXactOffset SLRU all required
information for the reader atomically on multiact insertion.
Before this change, we actually extended the multixact insertion time window to the next multixact
insertion time, and it seems a risky design.

I illustrate the new solution using the next diagram.

Writer Reader
--------------------------------------------------------------------------------
MultiXactIdCreateFromMembers
-> GetNewMultiXactId (100)
-> XLogInsert
-> RecordNewMultiXact
-> LWLockAcquire(MultiXactOffset)
-> write offset 100
-> write offset 101 *****************************************************************
-> LWLockRelease(MultiXactOffset)
GetMultiXactIdMembers(100)
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
Assert(offset 101 == 0)
-> ConditionVariableSleep()
-> length = offset 101 - read offset 100
--------------------------------------------------------------------------------------

So if I understand the core idea of your solution right, I think that the code in the last patch
(v10-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch) is correct and does what it should.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-11-24 14:45:48 Re: refactor AlterDomainAddConstraint (alter domain add constraint)
Previous Message Chao Li 2025-11-24 12:03:22 Re: get rid of Pointer type, mostly