| From: | Ivan Bykov <i(dot)bykov(at)modernsys(dot)ru> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Subject: | Re: IPC/MultixactCreation on the Standby server |
| Date: | 2025-11-24 14:32:26 |
| Message-ID: | 176399474623.1015.6638892143305450074.pgcf@coridan.postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, Andrey!
> Thanks for your review!
Review still in progress, sorry for the delay. I didn't have enough time to fully understand the changes you suggest, but it seems there is
only a small gap in my understanding of what the patch does. Here is my explanation of the problem.
The main problem
=============
The main problem is that we may lose session context before writing the offset to SLRU (but we may write
a WAL record). It seems that the writer process got stuck in the XLogInsert procedure or even failed between GetNewMultiXactId
and RecordNewMultiXact call. In this case, readers will wait to receive a conditional variable signal (from new multixacts)
but could not find a valid offset for the "failed" (it may be in WAL) multixid.
I illustrate this using the next diagram.
Writer Reader
--------------------------------------------------------------------------------
MultiXactIdCreateFromMembers
-> GetNewMultiXactId (101)
GetMultiXactIdMembers(100)
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
offset 101 == 0
-> ConditionVariableSleep()
+--------------------------------------------------------------------------------------+
|-> XLogInsert |
+--------------------------------------------------------------------------------------+
-> RecordNewMultiXact
-> LWLockAcquire(MultiXactOffset)
-> write offset 101
-> LWLockRelease(MultiXactOffset)
-> ConditionVariableBroadcast(nextoff_cv);
-> retry:
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
offset 101 != 0
-> length = offset 101 - read offset 100
-> LWLockAcquire(MultiXactMember)
-> write members 101
-> LWLockRelease(MultiXactOffset)
--------------------------------------------------------------------------------------
As far as I can see, your proposal seems to address exactly that problem.
The main difference from the former solution is writing to MultiXactOffset SLRU all required
information for the reader atomically on multiact insertion.
Before this change, we actually extended the multixact insertion time window to the next multixact
insertion time, and it seems a risky design.
I illustrate the new solution using the next diagram.
Writer Reader
--------------------------------------------------------------------------------
MultiXactIdCreateFromMembers
-> GetNewMultiXactId (100)
-> XLogInsert
-> RecordNewMultiXact
-> LWLockAcquire(MultiXactOffset)
-> write offset 100
-> write offset 101 *****************************************************************
-> LWLockRelease(MultiXactOffset)
GetMultiXactIdMembers(100)
-> LWLockAcquire(MultiXactOffset)
-> read offset 100
-> read offset 101
-> LWLockRelease(MultiXactOffset)
Assert(offset 101 == 0)
-> ConditionVariableSleep()
-> length = offset 101 - read offset 100
--------------------------------------------------------------------------------------
So if I understand the core idea of your solution right, I think that the code in the last patch
(v10-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch) is correct and does what it should.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2025-11-24 14:45:48 | Re: refactor AlterDomainAddConstraint (alter domain add constraint) |
| Previous Message | Chao Li | 2025-11-24 12:03:22 | Re: get rid of Pointer type, mostly |