Re: MultiXact\SLRU buffers configuration

From: Gilles Darold <gilles(at)darold(dot)net>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-12-11 17:50:25
Message-ID: 3319917a-679e-b07d-b194-473552b72082@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 10/12/2020 à 15:45, Gilles Darold a écrit :
> Le 08/12/2020 à 18:52, Andrey Borodin a écrit :
>> Hi Gilles!
>>
>> Many thanks for your message!
>>
>>> 8 дек. 2020 г., в 21:05, Gilles Darold<gilles(at)darold(dot)net> написал(а):
>>>
>>> I know that this report is not really helpful
>> Quite contrary - this benchmarks prove that controllable reproduction exists. I've rebased patches for PG11. Can you please benchmark them (without extending SLRU)?
>>
>> Best regards, Andrey Borodin.
>>
> Hi,
>
>
> Running tests yesterday with the patches has reported log of failures
> with error on INSERT and UPDATE statements:
>
>
> ERROR:  lock MultiXactOffsetControlLock is not held
>
>
> After a patch review this morning I think I have found what's going
> wrong. In patch
> v6-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offs.patch I
> think there is a missing reinitialisation of the lockmode variable to
> LW_NONE inside the retry loop after the call to LWLockRelease() in
> src/backend/access/transam/multixact.c:1392:GetMultiXactIdMembers().
> I've attached a new version of the patch for master that include the
> fix I'm using now with PG11 and with which everything works very well now.
>
>
> I'm running more tests to see the impact on the performances to play
> with multixact_offsets_slru_buffers, multixact_members_slru_buffers
> and multixact_local_cache_entries. I will reports the results later today.
>

Hi,

Sorry for the delay, I have done some further tests to try to reach the
limit without bottlenecks on multixact or shared buffers. The tests was
done on a Microsoft Asure machine with 2TB of RAM and 4 sockets Intel
Xeon Platinum 8280M (128 cpu). PG configuration:

    max_connections = 4096
    shared_buffers = 64GB
    max_prepared_transactions = 2048
    work_mem = 256MB
    maintenance_work_mem = 2GB
    wal_level = minimal
    synchronous_commit = off
    commit_delay = 1000
    commit_siblings = 10
    checkpoint_timeout = 1h
    max_wal_size = 32GB
    checkpoint_completion_target = 0.9

I have tested with several values for the different buffer's variables
starting from:

    multixact_offsets_slru_buffers = 64
    multixact_members_slru_buffers = 128
    multixact_local_cache_entries = 256

to the values with the best performances we achieve with this test to
avoid MultiXactOffsetControlLock or MultiXactMemberControlLock:

    multixact_offsets_slru_buffers = 128
    multixact_members_slru_buffers = 512
    multixact_local_cache_entries = 1024

Also shared_buffers have been increased up to 256GB to avoid
buffer_mapping contention.

Our last best test reports the following wait events:

     event_type |           event            |    sum
    ------------+----------------------------+-----------
     Client     | ClientRead                 | 321690211
     LWLock     | buffer_content             |   2970016
     IPC        | ProcArrayGroupUpdate       |   2317388
     LWLock     | ProcArrayLock              |   1445828
     LWLock     | WALWriteLock               |   1187606
     LWLock     | SubtransControlLock        |    972889
     Lock       | transactionid              |    840560
     Lock       | relation                   |    587600
     Activity   | LogicalLauncherMain        |    529599
     Activity   | AutoVacuumMain             |    528097

At this stage I don't think we can have better performances by tuning
these buffers at least with PG11.

About performances gain related to the patch for shared lock in
GetMultiXactIdMembers unfortunately I can not see a difference with or
without this patch, it could be related to our particular benchmark. But
clearly the patch on multixact buffers should be committed as this is
really helpfull to be able to tuned PG when multixact bottlenecks are found.

Best regards,

--
Gilles Darold
LzLabs GmbH
https://www.lzlabs.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-12-11 18:01:14 Re: Proposed patch for key managment
Previous Message Fujii Masao 2020-12-11 17:31:48 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit