Re: Standby trying "restore_command" before local WAL

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Sergei Kornilov <sk(at)zsrv(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "berge(at)trivini(dot)no" <berge(at)trivini(dot)no>, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Standby trying "restore_command" before local WAL
Date: 2018-08-03 13:41:05
Message-ID: CANP8+jLjV9AOwojPzUuCNX4_GLd-5Tx-XK8984kbttHQWm3Jjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 August 2018 at 21:08, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Aug 1, 2018 at 7:14 AM, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
>>> There's still a question here, at least from my perspective, as to which
>>> is actually going to be faster to perform recovery based off of. A good
>>> restore command, which pre-fetches the WAL in parallel and gets it local
>>> and on the same filesystem, meaning that the restore_command only has to
>>> execute essentially a 'mv' and return back to PG for the next WAL file,
>>> is really rather fast, compared to streaming that same data over the
>>> network with a single TCP connection to the primary. Of course, there's
>>> a lot of variables there and it depends on the network speed between the
>>> various pieces, but I've certainly had cases where a replica catches up
>>> much faster using restore command than streaming from the primary.
>>
>> Trying "restore_command" before streaming replication is totally fine.
>> It is not likely that the same WAL would be on both places anyway.
>>
>> My problem is trying "restore_command" before the local WAL. I
>> understand the historic reason of this design, but I don't think it is
>> expected behavior to anybody who is using "restore_command" together
>> with streaming replication.
>
> Right. I don't really understand the argument that this should be
> controlled by a GUC. I could see having a GUC to choose between
> archiving-first and streaming-first, but if it's safe to use the WAL
> we've already got in pg_xlog, it seems like that should take priority
> over every other approach. The comments lend themselves to a certain
> amount of doubt over whether we can actually trust the contents of
> pg_xlog, but if we can't, it seems like we just shouldn't use it - at
> all - ever. It doesn't make much sense to me to say "hey, pg_xlog
> might have evil bad WAL in it that we shouldn't replay, so let's look
> for the same WAL elsewhere first, but then if we don't find it, we'll
> just replay the bad stuff." I might be missing something, but that
> sounds a lot like "hey, this mysterious gloop I found might be rat
> poison, so let me go check if there's some fruit in the fruit bowl,
> but if I don't find any, I'm just going to eat the mysterious gloop."

The existing mechanism is designed to recover from data loss and looks
to still be a safe default.

If you have some data corruption somewhere you will want to trust the
archive copy rather than the pg_wal copy. If you don't have a copy in
the archive and the *only* copy you have is in pg_wal then we attempt
to trust it, bearing in mind each WAL record is CRC checked and
unlikely to pass if there is noticeable corruption. If you have strong
doubts about the contents of pg_wal, then you can simply delete the
files from pg_wal before you start, so you do effectively have a level
of control of how much you trust the files there.

That default wasn't ever changed when we introduced streaming, hence
the complaint.

I guess what we could say is, if the user has both streaming and
restore_command configured, then trust pg_wal. That would solve the
problem without a new parameter, but I can see cases where you might
want to still have control, but as Stephen says, a good
restore_command script will sort those cases out.

If we trust pg_wal over the archive, you still need to solve the
problem of what happens if pg_wal is behind the archive, so when you
hit end of pg_wal you would need to trap that error and flip back to
requesting any missing files, including the existing WAL file, from
the archive. That sounds like a challenge, parameter or not. A better
alternative might be to pre-scan each file in pg_wal and if it is
either not present or in some way corrupt, try to get from the
archive, so more of an exception handling situation, though still some
code.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-08-03 13:44:52 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Previous Message Etsuro Fujita 2018-08-03 13:28:08 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.