Re: Standby trying "restore_command" before local WAL

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "berge(at)trivini(dot)no" <berge(at)trivini(dot)no>, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Standby trying "restore_command" before local WAL
Date: 2018-08-06 16:11:56
Message-ID: 20180806161156.GM27724@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
> On 08/06/2018 05:19 PM, Stephen Frost wrote:
> >* David Steele (david(at)pgmasters(dot)net) wrote:
> >>I think for the stated scenario (known good standby that has been
> >>shutdown gracefully) it makes perfect sense to trust the contents of
> >>pg_wal. Call this scenario #1.
> >>
> >>An alternate scenario (#2) is that the data directory was copied using a
> >>basic copy tool and the pg_wal directory was not excluded from the copy.
> >> This means the contents of pg_wal will be in an inconsistent state.
> >>The files that are there might be partials (not with the extension,
> >>though) and you can easily have multiple partials. You will almost
> >>certainly not have everything you need to get to consistency.
>
> Yeah. But as Simon said, we do have fairly strong protections about applying
> corrupted WAL - every record is CRC-checked. So why not to fall-back to the
> restore_command only if the locally available WAL is not fully consistent?

"Corrupted" doesn't necessairly only mean "the data file was munged by
the storage somehow." In this case, corrupted could be an old and only
partial WAL file, in which case we'd possibly be missing WAL that needs
to be replayed to bring the cluster back to a valid state, no?

> >>But there's another good scenario (#3): where the pg_wal directory was
> >>preloaded with all the WAL required to make the cluster consistent or
> >>all the WAL that was available at restore time. In this case, it would
> >>be make sense to prefer the contents of pg_wal and only switch to
> >>restore_command after that has been exhausted.
> >>
> >>So, the choice of whether to prefer locally-stored or
> >>restore_command-fetched WAL is context-dependent, in my mind.
> >
> >Agreed.
>
> Maybe, not sure.

The argument that David makes above in scenario #2 certainly looks
entirely likely to me and I don't think we've got any real protections
against that. The current common use-cases happen to work around the
risk because tools like pg_basebackup ignore the existing pg_wal
directory when doing the backup and instead populate it with exactly the
correct WAL that's needed, and in cases where a restore command is
specified will always pull back only valid WAL, but I don't think we can
decide that this scenario (#2 from above):

#####
pg_start_backup
copy all files (including pg_wal)
pg_stop_backup
have a recovery.conf with a valid and correct restore_command set
start PG
#####

isn't something to worry about. Maybe I'm all wet and changing our
existing preference to read from pg_wal first and only then go to
restore_command will still work properly with this scenario, but I don't
think that's the case. I've not tested it though, just going off of
what I recall and what I understood those comments to be talking about.

If the above pseudo-code works just fine with preferring pg_wal over
restore_command, then fine, let's just make that change. If it doesn't
though, then I don't think we can make that change and instead we need
to either figure out some way to determine what's acceptable to pull
from pg_wal and what we have to ask restore_command for, or ask the user
to explicitly tell us if we can use pg_wal first.

> >>Ideally we could have a default that is safe in each scenario with
> >>perhaps an override if the user knows better. Scenario #1 would allow
> >>WAL to be read from pg_wal by default, scenario #2 would prefer fetched
> >>WAL, and scenario #3 could use a GUC to override the default fetch behavior.
> >
> >Not sure how we'd be able to automatically realize which scenario we're
> >in though..?
>
> But do we need to know it? I mean, can't we try the local WAL first, use it
> if it passes the CRC checks (and possibly some other checks), and only
> fallback to the remote WAL if it's identified as broken?

Maybe- but I think we need to be quite sure about that and I don't
believe that just checking the CRCs is enough.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-08-06 16:58:16 Improve behavior of concurrent TRUNCATE
Previous Message Stephen Frost 2018-08-06 16:01:43 Re: Standby trying "restore_command" before local WAL