Re: WIP: WAL prefetch (another approach)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-02-04 00:40:26
Message-ID: c5d52837-6256-0556-ac8c-d6d3d558820a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I did a bunch of tests on v15, mostly to asses how much could the
prefetching help. The most interesting test I did was this:

1) primary instance on a box with 16/32 cores, 64GB RAM, NVMe SSD

2) replica on small box with 4 cores, 8GB RAM, SSD RAID

3) pause replication on the replica (pg_wal_replay_pause)

4) initialize pgbench scale 2000 (fits into RAM on primary, while on
replica it's about 4x RAM)

5) run 1h pgbench: pgbench -N -c 16 -j 4 -T 3600 test

6) resume replication (pg_wal_replay_resume)

7) measure how long it takes to catch up, monitor lag

This is nicely reproducible test case, it eliminates influence of
network speed and so on.

Attached is a chart showing the lag with and without the prefetching. In
both cases we start with ~140GB of redo lag, and the chart shows how
quickly the replica applies that. The "waves" are checkpoints, where
right after a checkpoint the redo gets much faster thanks to FPIs and
then slows down as it gets to parts without them (having to do
synchronous random reads).

With master, it'd take ~16000 seconds to catch up. I don't have the
exact number, because I got tired of waiting, but the estimate is likely
accurate (judging by other tests and how regular the progress is).

With WAL prefetching enabled (I bumped up the buffer to 2MB, and
prefetch limit to 500, but that was mostly just arbitrary choice), it
finishes in ~3200 seconds. This includes replication of the pgbench
initialization, which took ~200 seconds and where prefetching is mostly
useless. That's a damn pretty improvement, I guess!

In a way, this means the tiny replica would be able to keep up with a
much larger machine, where everything is in memory.

One comment about the patch - the postgresql.conf.sample change says:

#recovery_prefetch = on # whether to prefetch pages logged with FPW
#recovery_prefetch_fpw = off # whether to prefetch pages logged with FPW

but clearly that comment is only for recovery_prefetch_fpw, the first
GUC enables prefetching in general.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
image/png 19.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-02-04 00:43:49 Re: Correct comment in StartupXLOG().
Previous Message Bruce Momjian 2021-02-04 00:21:25 Re: Multiple full page writes in a single checkpoint?