Re: MultiXact\SLRU buffers configuration

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-05-14 06:44:01
Message-ID: 102F673B-C9C5-4823-A92F-96799FC61ED0@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 14 мая 2020 г., в 11:16, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
>
> At Thu, 14 May 2020 10:19:42 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>>>> I'm looking more at MultiXact and it seems to me that we have a race condition there.
>>>>
>>>> When we create a new MultiXact we do:
>>>> 1. Generate new MultiXactId under MultiXactGenLock
>>>> 2. Record new mxid with members and offset to WAL
>>>> 3. Write offset to SLRU under MultiXactOffsetControlLock
>>>> 4. Write members to SLRU under MultiXactMemberControlLock
>>>
>>> But, don't we hold exclusive lock on the buffer through all the steps
>>> above?
>> Yes...Unless MultiXact is observed on StandBy. This could lead to observing inconsistent snapshot: one of lockers committed tuple delete, but standby sees it as alive.
>
> Ah, right. I looked from GetNewMultiXactId. Actually
> XLOG_MULTIXACT_CREATE_ID is not protected from concurrent reference to
> the creating mxact id. And GetMultiXactIdMembers is considering that
> case.
>
>>>> When we read MultiXact we do:
>>>> 1. Retrieve offset by mxid from SLRU under MultiXactOffsetControlLock
>>>> 2. If offset is 0 - it's not filled in at step 4 of previous algorithm, we sleep and goto 1
>>>> 3. Retrieve members from SLRU under MultiXactMemberControlLock
>>>> 4. ..... what we do if there are just zeroes because step 4 is not executed yet? Nothing, return empty members list.
>>>
>>> So transactions never see such incomplete mxids, I believe.
>> I've observed sleep in step 2. I believe it's possible to observe special effects of step 4 too.
>> Maybe we could add lock on standby to dismiss this 1000us wait? Sometimes it hits hard on Standbys: if someone is locking whole table on primary - all seq scans on standbys follow him with MultiXactOffsetControlLock contention.
>
> GetMultiXactIdMembers believes that 4 is successfully done if 2
> returned valid offset, but actually that is not obvious.
>
> If we add a single giant lock just to isolate ,say,
> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> unnecessarily. Perhaps we need finer-grained locking-key for standby
> that works similary to buffer lock on primary, that doesn't cause
> confilicts between irrelevant mxids.
>
We can just replay members before offsets. If offset is already there - members are there too.
But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.

Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
So instead of fix, we, probably, can just add a comment. If this reasoning is correct.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-05-14 06:52:10 Re: Parallel copy
Previous Message Fujii Masao 2020-05-14 06:27:37 Re: SLRU statistics