Re: WIP: WAL prefetch (another approach)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2020-11-18 21:00:39
Message-ID: 20201118210039.GP16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Thomas Munro (thomas(dot)munro(at)gmail(dot)com) wrote:
> On Sat, Nov 14, 2020 at 4:13 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Tomas Vondra (tomas(dot)vondra(at)enterprisedb(dot)com) wrote:
> > > On 11/13/20 3:20 AM, Thomas Munro wrote:
> > > > I'm not really sure what to do about achive restore scripts that
> > > > block. That seems to be fundamentally incompatible with what I'm
> > > > doing here.
> > >
> > > IMHO we can't do much about that, except for documenting it - if the
> > > prefetch can't work because of blocking restore script, someone has to
> > > fix/improve the script. No way around that, I'm afraid.
> >
> > I'm a bit confused about what the issue here is- is the concern that a
> > restore_command is specified that isn't allowed to run concurrently but
> > this patch is intending to run more than one concurrently..? There's
> > another patch that I was looking at for doing pre-fetching of WAL
> > segments, so if this is also doing that we should figure out which
> > patch we want..
>
> The problem is that the recovery loop tries to look further ahead in
> between applying individual records, which causes the restore script
> to run, and if that blocks, we won't apply records that we already
> have, because we're waiting for the next WAL file to appear. This
> behaviour is on by default with my patch, so pg_standby will introduce
> a weird replay delays. We could think of some ways to fix that, with
> meaningful return codes and periodic polling or something, I suppose,
> but something feels a bit weird about it.

Ah, yeah, that's clearly an issue that should be addressed. There's a
nearby thread which is talking about doing exactly that, so, perhaps
this doesn't need to be worried about here..?

> > I don't know that it's needed, but it feels likely that we could provide
> > a better result if we consider making changes to the restore_command API
> > (eg: have a way to say "please fetch this many segments ahead, and you
> > can put them in this directory with these filenames" or something). I
> > would think we'd be able to continue supporting the existing API and
> > accept that it might not be as performant.
>
> Hmm. Every time I try to think of a protocol change for the
> restore_command API that would be acceptable, I go around the same
> circle of thoughts about event flow and realise that what we really
> need for this is ... a WAL receiver...

A WAL receiver, or an independent process which goes out ahead and
fetches WAL..?

Still, I wonder about having a way to inform the command that's run by
the restore_command of what it is we really want, eg:

restore_command = 'somecommand --async=%a --target=%t --target-name=%n --target-xid=%x --target-lsn=%l --target-timeline=%i --dest-dir=%d'

Such that '%a' is either yes, or no, indicating if the restore command
should operate asyncronously and pre-fetch WAL, %t is either empty (or
mabye 'unset') or 'immediate', %n/%x/%l are similar to %t, %i is either
a specific timeline or 'immediate' (somecommand should be understanding
of timelines and know how to parse history files to figure out the right
timeline to fetch along, based on the destination requested), and %d is
a directory for somecommand to place WAL files into (perhaps with an
alternative naming scheme, if we feel we need one).

The amount pre-fetching which 'somecommand' would do, and how many
processes it would use to do so, could either be configured as part of
the options passed to 'somecommand', which we would just pass through,
or through its own configuration file.

A restore_command which is set but doesn't include a %a or %d or such
would be assumed to work in the same manner as today.

For my part, at least, I don't think this is really that much of a
stretch, to expect a restore_command to be able to pre-populate a
directory with WAL files- certainly there's at least one that already
does this, even though it doesn't have all the information directly
passed to it.. Would be nice if it did. :)

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-11-18 21:41:27 "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)
Previous Message Alexander Lakhin 2020-11-18 21:00:00 Re: More time spending with "delete pending"