Re: MultiXact\SLRU buffers configuration

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: x4mmm(at)yandex-team(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-05-20 04:54:04
Message-ID: 20200520.135404.64670166185539892.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 15 May 2020 14:01:46 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
>
>
> > 15 мая 2020 г., в 05:03, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> написал(а):
> >
> > At Thu, 14 May 2020 11:44:01 +0500, "Andrey M. Borodin" <x4mmm(at)yandex-team(dot)ru> wrote in
> >>> GetMultiXactIdMembers believes that 4 is successfully done if 2
> >>> returned valid offset, but actually that is not obvious.
> >>>
> >>> If we add a single giant lock just to isolate ,say,
> >>> GetMultiXactIdMember and RecordNewMultiXact, it reduces concurrency
> >>> unnecessarily. Perhaps we need finer-grained locking-key for standby
> >>> that works similary to buffer lock on primary, that doesn't cause
> >>> confilicts between irrelevant mxids.
> >>>
> >> We can just replay members before offsets. If offset is already there - members are there too.
> >> But I'd be happy if we could mitigate those 1000us too - with a hint about last maixd state in a shared MX state, for example.
> >
> > Generally in such cases, condition variables would work. In the
> > attached PoC, the reader side gets no penalty in the "likely" code
> > path. The writer side always calls ConditionVariableBroadcast but the
> > waiter list is empty in almost all cases. But I couldn't cause the
> > situation where the sleep 1000u is reached.
> Thanks! That really looks like a good solution without magic timeouts. Beautiful!
> I think I can create temporary extension which calls MultiXact API and tests edge-cases like this 1000us wait.
> This extension will also be also useful for me to assess impact of bigger buffers, reduced read locking (as in my 2nd patch) and other tweaks.

Happy to hear that, It would need to use timeout just in case, though.

> >> Actually, if we read empty mxid array instead of something that is replayed just yet - it's not a problem of inconsistency, because transaction in this mxid could not commit before we started. ISTM.
> >> So instead of fix, we, probably, can just add a comment. If this reasoning is correct.
> >
> > The step 4 of the reader side reads the members of the target mxid. It
> > is already written if the offset of the *next* mxid is filled-in.
> Most often - yes, but members are not guaranteed to be filled in order. Those who win MXMemberControlLock will write first.
> But nobody can read members of MXID before it is returned. And its members will be written before returning MXID.

Yeah, right. Otherwise assertion failure happens.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-05-20 05:10:40 Re: Problem with pg_atomic_compare_exchange_u64 at 32-bit platformwd
Previous Message Kyotaro Horiguchi 2020-05-20 04:32:04 Re: Is it useful to record whether plans are generic or custom?