Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-03-03 13:30:00
Message-ID: CALj2ACUpQGiwQTzmoSMOFk5=WiJc06FcYpxzBX0SEej4ProRzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 1, 2023 at 9:45 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Tue, Feb 28, 2023 at 10:38:31AM +0530, Bharath Rupireddy wrote:
> > On Tue, Feb 28, 2023 at 6:14 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> Why do we only read a page at a time in XLogReadFromBuffersGuts()? What is
> >> preventing us from copying all the data we need in one go?
> >
> > Note that most of the WALRead() callers request a single page of
> > XLOG_BLCKSZ bytes even if the server has less or more available WAL
> > pages. It's the streaming replication wal sender that can request less
> > than XLOG_BLCKSZ bytes and upto MAX_SEND_SIZE (16 * XLOG_BLCKSZ). And,
> > if we read, say, MAX_SEND_SIZE at once while holding
> > WALBufMappingLock, that might impact concurrent inserters (at least, I
> > can say it in theory) - one of the main intentions of this patch is
> > not to impact inserters much.
>
> Perhaps we should test both approaches to see if there is a noticeable
> difference. It might not be great for concurrent inserts to repeatedly
> take the lock, either. If there's no real difference, we might be able to
> simplify the code a bit.

I took a stab at this - acquire WALBufMappingLock separately for each
requested WAL buffer page vs acquire WALBufMappingLock once for all
requested WAL buffer pages. I chose the pgbench tpcb-like benchmark
that has 3 UPDATE statements and 1 INSERT statement. I ran pgbench for
30min with scale factor 100 and 4096 clients with primary and 1 async
standby, see [1]. I captured wait_events to see the contention on
WALBufMappingLock. I haven't noticed any contention on the lock and no
difference in TPS too, see [2] for results on HEAD, see [3] for
results on v6 patch which has "acquire WALBufMappingLock separately
for each requested WAL buffer page" strategy and see [4] for results
on v7 patch (attached herewith) which has "acquire WALBufMappingLock
once for all requested WAL buffer pages" strategy. Another thing to
note from the test results is that reduction in WALRead IO wait events
from 136 on HEAD to 1 on v6 or v7 patch. So, the read from WAL buffers
is really helping here.

With these observations, I'd like to use the approach that acquires
WALBufMappingLock once for all requested WAL buffer pages unlike v6
and the previous patches.

I'm attaching the v7 patch set with this change for further review.

[1]
shared_buffers = '8GB'
wal_buffers = '1GB'
max_wal_size = '16GB'
max_connections = '5000'
archive_mode = 'on'
archive_command='cp %p /home/ubuntu/archived_wal/%f'
./pgbench --initialize --scale=100 postgres
./pgbench -n -M prepared -U ubuntu postgres -b tpcb-like -c4096 -j4096 -T1800

[2]
HEAD:
done in 20.03 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.53 s, vacuum 0.19 s, primary keys 4.30 s).
tps = 11654.475345 (without initial connection time)

50950253 Lock | transactionid
16472447 Lock | tuple
3869523 LWLock | LockManager
739283 IPC | ProcArrayGroupUpdate
718549 |
439877 LWLock | WALWrite
130737 Client | ClientRead
121113 LWLock | BufferContent
70778 LWLock | WALInsert
43346 IPC | XactGroupUpdate
18547
18546 Activity | LogicalLauncherMain
18545 Activity | AutoVacuumMain
18272 Activity | ArchiverMain
17627 Activity | WalSenderMain
17207 Activity | WalWriterMain
15455 IO | WALSync
14963 LWLock | ProcArray
14747 LWLock | XactSLRU
13943 Timeout | CheckpointWriteDelay
10519 Activity | BgWriterHibernate
8022 Activity | BgWriterMain
4486 Timeout | SpinDelay
4443 Activity | CheckpointerMain
1435 Lock | extend
670 LWLock | XidGen
373 IO | WALWrite
283 Timeout | VacuumDelay
268 IPC | ArchiveCommand
249 Timeout | VacuumTruncate
136 IO | WALRead
115 IO | WALInitSync
74 IO | DataFileWrite
67 IO | WALInitWrite
36 IO | DataFileFlush
35 IO | DataFileExtend
17 IO | DataFileRead
4 IO | SLRUWrite
3 IO | BufFileWrite
2 IO | DataFileImmediateSync
1 Tuples only is on.
1 LWLock | SInvalWrite
1 LWLock | LockFastPath
1 IO | ControlFileSyncUpdate

[3]
done in 19.99 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.52 s, vacuum 0.18 s, primary keys 4.28 s).
tps = 11689.584538 (without initial connection time)

50678977 Lock | transactionid
16252048 Lock | tuple
4146827 LWLock | LockManager
768256 |
719923 IPC | ProcArrayGroupUpdate
432836 LWLock | WALWrite
140354 Client | ClientRead
124203 LWLock | BufferContent
74355 LWLock | WALInsert
39852 IPC | XactGroupUpdate
30728
30727 Activity | LogicalLauncherMain
30726 Activity | AutoVacuumMain
30420 Activity | ArchiverMain
29881 Activity | WalSenderMain
29418 Activity | WalWriterMain
23428 Activity | BgWriterHibernate
15960 Timeout | CheckpointWriteDelay
15840 IO | WALSync
15066 LWLock | ProcArray
14577 Activity | CheckpointerMain
14377 LWLock | XactSLRU
7291 Activity | BgWriterMain
4336 Timeout | SpinDelay
1707 Lock | extend
720 LWLock | XidGen
362 Timeout | VacuumTruncate
360 IO | WALWrite
304 Timeout | VacuumDelay
301 IPC | ArchiveCommand
106 IO | WALInitSync
82 IO | DataFileWrite
66 IO | WALInitWrite
45 IO | DataFileFlush
25 IO | DataFileExtend
18 IO | DataFileRead
5 LWLock | LockFastPath
2 IO | DataFileSync
2 IO | DataFileImmediateSync
1 Tuples only is on.
1 LWLock | BufferMapping
1 IO | WALRead
1 IO | SLRUWrite
1 IO | SLRURead
1 IO | ReplicationSlotSync
1 IO | BufFileRead

[4]
done in 19.92 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 15.53 s, vacuum 0.23 s, primary keys 4.16 s).
tps = 11671.869074 (without initial connection time)

50614021 Lock | transactionid
16482561 Lock | tuple
4086451 LWLock | LockManager
777507 |
714329 IPC | ProcArrayGroupUpdate
420593 LWLock | WALWrite
138142 Client | ClientRead
125381 LWLock | BufferContent
75283 LWLock | WALInsert
38759 IPC | XactGroupUpdate
20283
20282 Activity | LogicalLauncherMain
20281 Activity | AutoVacuumMain
20002 Activity | ArchiverMain
19467 Activity | WalSenderMain
19036 Activity | WalWriterMain
15836 IO | WALSync
15708 Timeout | CheckpointWriteDelay
15346 LWLock | ProcArray
15095 LWLock | XactSLRU
11852 Activity | BgWriterHibernate
8424 Activity | BgWriterMain
4636 Timeout | SpinDelay
4415 Activity | CheckpointerMain
2048 Lock | extend
1457 Timeout | VacuumTruncate
646 LWLock | XidGen
402 IO | WALWrite
306 Timeout | VacuumDelay
278 IPC | ArchiveCommand
117 IO | WALInitSync
74 IO | DataFileWrite
66 IO | WALInitWrite
35 IO | DataFileFlush
29 IO | DataFileExtend
24 LWLock | LockFastPath
14 IO | DataFileRead
2 IO | SLRUWrite
2 IO | DataFileImmediateSync
2 IO | BufFileWrite
1 Tuples only is on.
1 LWLock | BufferMapping
1 IO | WALRead
1 IO | SLRURead
1 IO | BufFileRead

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v7-0001-Improve-WALRead-to-suck-data-directly-from-WAL-bu.patch application/x-patch 7.2 KB
v7-0002-Add-test-module-for-verifying-WAL-read-from-WAL-b.patch application/x-patch 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-03-03 13:41:55 Re: Testing autovacuum wraparound (including failsafe)
Previous Message Önder Kalacı 2023-03-03 13:10:33 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher