Re: MultiXact\SLRU buffers configuration

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2020-11-10 00:13:22
Message-ID: 9b4d17df-b811-8323-16be-3cab913216d1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

After the issue reported in [1] got fixed, I've restarted the multi-xact
stress test, hoping to reproduce the issue. But so far no luck :-(

I've started slightly different tests on two machines - on one machine
I've done this:

a) init.sql

create table t (a int);
insert into t select i from generate_series(1,100000000) s(i);
alter table t add primary key (a);

b) select.sql

SELECT * FROM t
WHERE a = (1+mod(abs(hashint4(extract(epoch from now())::int)),
100000000)) FOR KEY SHARE;

c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

The idea is to have large table and many clients hitting a small random
subset of the rows. A sample of wait events from ~24h run looks like this:

e_type | e_name | sum
----------+----------------------+----------
LWLock | BufferContent | 13913863
| | 7194679
LWLock | WALWrite | 1710507
Activity | LogicalLauncherMain | 726599
Activity | AutoVacuumMain | 726127
Activity | WalWriterMain | 725183
Activity | CheckpointerMain | 604694
Client | ClientRead | 599900
IO | WALSync | 502904
Activity | BgWriterMain | 378110
Activity | BgWriterHibernate | 348464
IO | WALWrite | 129412
LWLock | ProcArray | 6633
LWLock | WALInsert | 5714
IO | SLRUWrite | 2580
IPC | ProcArrayGroupUpdate | 2216
LWLock | XactSLRU | 2196
Timeout | VacuumDelay | 1078
IPC | XactGroupUpdate | 737
LWLock | LockManager | 503
LWLock | WALBufMapping | 295
LWLock | MultiXactMemberSLRU | 267
IO | DataFileWrite | 68
LWLock | BufferIO | 59
IO | DataFileRead | 27
IO | DataFileFlush | 14
LWLock | MultiXactGen | 7
LWLock | BufferMapping | 1

So, nothing particularly interesting - there certainly are not many wait
events related to SLRU.

On the other machine I did this:

a) init.sql
create table t (a int primary key);
insert into t select i from generate_series(1,1000) s(i);

b) select.sql
select * from t for key share;

c) pgbench -n -c 32 -j 8 -f select.sql -T $((24*3600)) test

and the wait events (24h run too) look like this:

e_type | e_name | sum
-----------+-----------------------+----------
LWLock | BufferContent | 20804925
| | 2575369
Activity | LogicalLauncherMain | 745780
Activity | AutoVacuumMain | 745292
Activity | WalWriterMain | 740507
Activity | CheckpointerMain | 737691
Activity | BgWriterHibernate | 731123
LWLock | WALWrite | 570107
IO | WALSync | 452603
Client | ClientRead | 151438
BufferPin | BufferPin | 23466
LWLock | WALInsert | 21631
IO | WALWrite | 19050
LWLock | ProcArray | 15082
Activity | BgWriterMain | 14655
IPC | ProcArrayGroupUpdate | 7772
LWLock | WALBufMapping | 3555
IO | SLRUWrite | 1951
LWLock | MultiXactGen | 1661
LWLock | MultiXactMemberSLRU | 359
LWLock | MultiXactOffsetSLRU | 242
LWLock | XactSLRU | 141
IPC | XactGroupUpdate | 104
LWLock | LockManager | 28
IO | DataFileRead | 4
IO | ControlFileSyncUpdate | 1
Timeout | VacuumDelay | 1
IO | WALInitWrite | 1

Also nothing particularly interesting - few SLRU wait events.

So unfortunately this does not really reproduce the SLRU locking issues
you're observing - clearly, there has to be something else triggering
it. Perhaps this workload is too simplistic, or maybe we need to run
different queries. Or maybe the hw needs to be somewhat different (more
CPUs? different storage?)

[1]
https://www.postgresql.org/message-id/20201104013205.icogbi773przyny5@development

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-11-10 00:43:59 Make Append Cost aware of some run time partition prune case
Previous Message Peter Geoghegan 2020-11-09 23:55:38 Re: Hybrid Hash/Nested Loop joins and caching results from subplans