Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-09-05 00:05:52
Message-ID: CA+hUKGLr6nU7yb8CO7_DEppBmPCOkCUvE6HBza8himYV_s5nuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 2, 2020 at 2:18 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On Wed, Sep 02, 2020 at 02:05:10AM +1200, Thomas Munro wrote:
> >On Wed, Sep 2, 2020 at 1:14 AM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> from the archive
> >
> >Ahh, so perhaps that's the key.
>
> Maybe. For the record, the commands look like this:
>
> archive_command = 'gzip -1 -c %p > /mnt/raid/wal-archive/%f.gz'
>
> restore_command = 'gunzip -c /mnt/raid/wal-archive/%f.gz > %p.tmp && mv %p.tmp %p'

Yeah, sorry, I goofed here by not considering archive recovery
properly. I have special handling for crash recovery from files in
pg_wal (XLRO_END, means read until you run out of files) and streaming
replication (XLRO_WALRCV_WRITTEN, means read only as far as the wal
receiver has advertised as written in shared memory), as a way to
control the ultimate limit on how far ahead to read when
maintenance_io_concurrency and max_recovery_prefetch_distance don't
limit you first. But if you recover from a base backup with a WAL
archive, it uses the XLRO_END policy which can run out of files just
because a new file hasn't been restored yet, so it gives up
prefetching too soon, as you're seeing. That doesn't cause any
damage, but it stops doing anything useful because the prefetcher
thinks its job is finished.

It'd be possible to fix this somehow in the two-XLogReader design, but
since I'm testing a new version that has a unified
XLogReader-with-read-ahead I'm not going to try to do that. I've
added a basebackup-with-archive recovery to my arsenal of test
workloads to make sure I don't forget about archive recovery mode
again, but I think it's actually harder to get this wrong in the new
design. In the meantime, if you are still interested in studying the
potential speed-up from WAL prefetching using the most recently shared
two-XLogReader patch, you'll need to unpack all your archived WAL
files into pg_wal manually beforehand.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-09-05 00:13:32 Re: A micro-optimisation for walkdir()
Previous Message Thomas Munro 2020-09-04 23:15:07 Re: A micro-optimisation for walkdir()