Re: MultiXact\SLRU buffers configuration

From: Gilles Darold <gilles(at)darold(dot)net>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-12-08 16:05:34
Message-ID: 6ba7eae2-8b0c-0690-11a5-e921e6586180@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 13/11/2020 à 12:49, Andrey Borodin a écrit :
>
>> 10 нояб. 2020 г., в 23:07, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> написал(а):
>>
>> On 11/10/20 7:16 AM, Andrey Borodin wrote:
>>>
>>> but this picture was not stable.
>>>
>> Seems we haven't made much progress in reproducing the issue :-( I guess
>> we'll need to know more about the machine where this happens. Is there
>> anything special about the hardware/config? Are you monitoring size of
>> the pg_multixact directory?
> It's Ubuntu 18.04.4 LTS, Intel Xeon E5-2660 v4, 56 CPU cores with 256Gb of RAM.
> PostgreSQL 10.14, compiled by gcc 7.5.0, 64-bit
>
> No, unfortunately we do not have signals for SLRU sizes.
> 3.5Tb mdadm raid10 over 28 SSD drives, 82% full.
>
> First incident triggering investigation was on 2020-04-19, at that time cluster was running on PG 10.11. But I think it was happening before.
>
> I'd say nothing special...
>
>>> How do you collect wait events for aggregation? just insert into some table with cron?
>>>
>> No, I have a simple shell script (attached) sampling data from
>> pg_stat_activity regularly. Then I load it into a table and aggregate to
>> get a summary.
> Thanks!
>
> Best regards, Andrey Borodin.

Hi,

Some time ago I have encountered a contention on
MultiXactOffsetControlLock with a performances benchmark. Here are the
wait event monitoring result with a pooling each 10 seconds and a 30
minutes run for the benchmarl:

 event_type |           event            |   sum
------------+----------------------------+----------
 Client     | ClientRead                 | 44722952
 LWLock     | MultiXactOffsetControlLock | 30343060
 LWLock     | multixact_offset           | 16735250
 LWLock     | MultiXactMemberControlLock |  1601470
 LWLock     | buffer_content             |   991344
 LWLock     | multixact_member           |   805624
 Lock       | transactionid              |   204997
 Activity   | LogicalLauncherMain        |   198834
 Activity   | CheckpointerMain           |   198834
 Activity   | AutoVacuumMain             |   198469
 Activity   | BgWriterMain               |   184066
 Activity   | WalWriterMain              |   171571
 LWLock     | WALWriteLock               |    72428
 IO         | DataFileRead               |    35708
 Activity   | BgWriterHibernate          |    12741
 IO         | SLRURead                   |     9121
 Lock       | relation                   |     8858
 LWLock     | ProcArrayLock              |     7309
 LWLock     | lock_manager               |     6677
 LWLock     | pg_stat_statements         |     4194
 LWLock     | buffer_mapping             |     3222

After reading this thread I change the value of the buffer size to 32
and 64 and obtain the following results:

 event_type |           event            |    sum
------------+----------------------------+-----------
 Client     | ClientRead                 | 268297572
 LWLock     | MultiXactMemberControlLock |  65162906
 LWLock     | multixact_member           |  33397714
 LWLock     | buffer_content             |   4737065
 Lock       | transactionid              |   2143750
 LWLock     | SubtransControlLock        |   1318230
 LWLock     | WALWriteLock               |   1038999
 Activity   | LogicalLauncherMain        |    940598
 Activity   | AutoVacuumMain             |    938566
 Activity   | CheckpointerMain           |    799052
 Activity   | WalWriterMain              |    749069
 LWLock     | subtrans                   |    710163
 Activity   | BgWriterHibernate          |    536763
 Lock       | object                     |    514225
 Activity   | BgWriterMain               |    394206
 LWLock     | lock_manager               |    295616
 IO         | DataFileRead               |    274236
 LWLock     | ProcArrayLock              |     77099
 Lock       | tuple                      |     59043
 IO         | CopyFileWrite              |     45611
 Lock       | relation                   |     42714

There was still contention on multixact but less than the first run. I
have increased the buffers to 128 and 512 and obtain the best results
for this bench:

 event_type |           event            |    sum
------------+----------------------------+-----------
 Client     | ClientRead                 | 160463037
 LWLock     | MultiXactMemberControlLock |   5334188
 LWLock     | buffer_content             |   5228256
 LWLock     | buffer_mapping             |   2368505
 LWLock     | SubtransControlLock        |   2289977
 IPC        | ProcArrayGroupUpdate       |   1560875
 LWLock     | ProcArrayLock              |   1437750
 Lock       | transactionid              |    825561
 LWLock     | subtrans                   |    772701
 LWLock     | WALWriteLock               |    666138
 Activity   | LogicalLauncherMain        |    492585
 Activity   | CheckpointerMain           |    492458
 Activity   | AutoVacuumMain             |    491548
 LWLock     | lock_manager               |    426531
 Lock       | object                     |    403581
 Activity   | WalWriterMain              |    394668
 Activity   | BgWriterHibernate          |    293112
 Activity   | BgWriterMain               |    195312
 LWLock     | MultiXactGenLock           |    177820
 LWLock     | pg_stat_statements         |    173864
 IO         | DataFileRead               |    173009

I hope these metrics can have some interest to show the utility of this
patch but unfortunately I can not be more precise and provide reports
for the entire patch. The problem is that this benchmark is run on an
application that use PostgreSQL 11 and I can not back port the full
patch, there was too much changes since PG11. I have just increase the
size of NUM_MXACTOFFSET_BUFFERS and NUM_MXACTMEMBER_BUFFERS. This allow
us to triple the number of simultaneous connections between the first
and the last test.

I know that this report is not really helpful but at least I can give
more information on the benchmark that was used. This is the proprietary
zRef benchmark which compares the same Cobol programs (transactional and
batch) executed both on mainframes and on x86 servers. Instead  of a DB2
z/os database we use PostgreSQL v11. This test has extensive use of
cursors (each select, even read only, is executed through a cursor) and
the contention was observed with update on tables with some foreign
keys. There is no explicit FOR SHARE on the queries, only some FOR
UPDATE clauses. I guess that the multixact contention is the result of
the for share locks produced for FK.

So in our case being able to tune the multixact buffers could help a lot
to improve the performances.

--
Gilles Darold

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2020-12-08 16:31:09 Re: SEARCH and CYCLE clauses
Previous Message Tom Lane 2020-12-08 16:05:05 Re: [HACKERS] [PATCH] Generic type subscripting