Re: MultiXact\SLRU buffers configuration

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-10-28 23:32:43
Message-ID: 20201028233243.ygm6yqlynkqpzekr@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Oct 28, 2020 at 12:34:58PM +0500, Andrey Borodin wrote:
>Tomas, thanks for looking into this!
>
>> 28 окт. 2020 г., в 06:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> написал(а):
>>
>>
>> This thread started with a discussion about making the SLRU sizes
>> configurable, but this patch version only adds a local cache. Does this
>> achieve the same goal, or would we still gain something by having GUCs
>> for the SLRUs?
>>
>> If we're claiming this improves performance, it'd be good to have some
>> workload demonstrating that and measurements. I don't see anything like
>> that in this thread, so it's a bit hand-wavy. Can someone share details
>> of such workload (even synthetic one) and some basic measurements?
>
>All patches in this thread aim at the same goal: improve performance in presence of MultiXact locks contention.
>I could not build synthetical reproduction of the problem, however I did some MultiXact stressing here [0]. It's a clumsy test program, because it still is not clear to me which parameters of workload trigger MultiXact locks contention. In generic case I was encountering other locks like *GenLock: XidGenLock, MultixactGenLock etc. Yet our production system encounters this problem approximately once in a month through this year.
>
>Test program locks for share different set of tuples in presence of concurrent full scans.
>To produce a set of locks we choose one of 14 bits. If a row number has this bit set to 0 we add lock it.
>I've been measuring time to lock all rows 3 time for each of 14 bits, observing total time to set all locks.
>During the test I was observing locks in pg_stat_activity, if they did not contain enough MultiXact locks I was tuning parameters further (number of concurrent clients, number of bits, select queries etc).
>
>Why is it so complicated? It seems that other reproductions of a problem were encountering other locks.
>

It's not my intention to be mean or anything like that, but to me this
means we don't really understand the problem we're trying to solve. Had
we understood it, we should be able to construct a workload reproducing
the issue ...

I understand what the individual patches are doing, and maybe those
changes are desirable in general. But without any benchmarks from a
plausible workload I find it hard to convince myself that:

(a) it actually will help with the issue you're observing on production

and

(b) it's actually worth the extra complexity (e.g. the lwlock changes)

I'm willing to invest some of my time into reviewing/testing this, but I
think we badly need better insight into the issue, so that we can build
a workload reproducing it. Perhaps collecting some perf profiles and a
sample of the queries might help, but I assume you already tried that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-28 23:41:23 Re: Autovacuum worker doesn't immediately exit on postmaster death
Previous Message Tomas Vondra 2020-10-28 23:21:01 Re: MultiXact\SLRU buffers configuration