Re: Standby trying "restore_command" before local WAL

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: Sergei Kornilov <sk(at)zsrv(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "berge(at)trivini(dot)no" <berge(at)trivini(dot)no>, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Standby trying "restore_command" before local WAL
Date: 2018-08-06 16:01:43
Message-ID: 20180806160143.GL27724@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Alexander Kukushkin (cyberdemn(at)gmail(dot)com) wrote:
> 2018-07-31 20:25 GMT+02:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> > There's still a question here, at least from my perspective, as to which
> > is actually going to be faster to perform recovery based off of. A good
> > restore command, which pre-fetches the WAL in parallel and gets it local
> > and on the same filesystem, meaning that the restore_command only has to
> > execute essentially a 'mv' and return back to PG for the next WAL file,
> > is really rather fast, compared to streaming that same data over the
> > network with a single TCP connection to the primary. Of course, there's
> > a lot of variables there and it depends on the network speed between the
> > various pieces, but I've certainly had cases where a replica catches up
> > much faster using restore command than streaming from the primary.
>
> Sure, mv is incredibly fast, but not calling external script/binary at
> all is still faster than calling it.

I don't believe I was disputing that, apologies if it came across that
way. Certainly, reading files directly without going through restore
command is going to be faster than having to call restore command. The
point I was attempting to make is that using restore command might be
(and in some cases, certainly is) faster than streaming from a primary.

> What about the following cases?
> 1. replica host crashed, and in pg_wal we have a few thousands WAL files.

If this is the case then the replica was very far behind on replay,
presumably, and in some of those cases rebuilding the replica might
very well be faster than replaying all of that WAL. This case does
sound like it should be alright though.

> 2. we are creating a new replica with pg_basebackup -X stream, it
> takes a long time and again leaves a few thousands WAL files.

This is certainly typical and also should be a safe case and therefore
seems like a good case where we'd want to be able to tell the system to
use what's in pg_wal first- perhaps that could be an option in
recovery.conf which pg_basebackup and other tools that are managing the
pg_wal directory and ensuring that all the WAL there is valid would be
able to write into the recovery.conf.

> In both cases, if there is no restore_command in the recovery.conf,
> postgres will happily read WAL files from pg_wal and only when there
> is nothing left it will try to start streaming.
>
> But, if restore_command is defined, it will always call the
> restore_command, for every single WAL file it wants to restore.
> If the restore_command exits with non zero exit code, postgres is
> happily restoring the file from pg_wal!
> And, only if the file is not there or not valid, postgres is trying to
> start streaming.

Yeah, I have to agree that it's not great that we don't seem to be
entirely consistent here, as Robert pointed out up-thread.

> >From my point of view, there is no difference between having no
> restore_command and relying only on streaming replication and having
> the restore_comman which always fails.
> Therefore I don't really understand why we stick to the
> "restore_command => pg_wal => streaming" and why it is not possible to
> change it to "pg_wal => restore_command => streaming" or maybe even
> (pg_wal => streaming => restore_command).

I don't think I disagreed anywhere about having the option. There's a
good point to be made that if we can figure out what the right thing to
do is then we should just do that instead of having an option for it.

If there's any case where the pg_wal directory might have invalid WAL
to be replayed over top of the current cluster, though, then we
shouldn't just be using that WAL and instead should be asking the user
to let us know if the WAL is ok to use. If we can know when the WAL is
invalid and ignore using it in those cases, then we should just go ahead
and do that, but I'm unconvinced that's actually the case in a situation
such as what David Steele described in his scenario #2.

> I am not sure about the last option, but in any case. before going to
> some remote place, postgres should try to find (and try to replay) the
> WAL file in the pg_wal.

Only if we know that it's valid to be replayed over the current cluster.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-08-06 16:11:56 Re: Standby trying "restore_command" before local WAL
Previous Message Tomas Vondra 2018-08-06 15:40:02 Re: Standby trying "restore_command" before local WAL