Re: Standby trying "restore_command" before local WAL

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "berge(at)trivini(dot)no" <berge(at)trivini(dot)no>, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Standby trying "restore_command" before local WAL
Date: 2018-07-31 18:25:46
Message-ID: 20180731182546.GD27724@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Sergei Kornilov (sk(at)zsrv(dot)org) wrote:
> > As mentioned by others, it sounds like we could have an option to try
> > contacting the primary before running restore_commnad
> Why about primary?
> If we have restore_command on slave (or during point in time recovery) - we force using XLOG_FROM_ARCHIVE, even if XLOG_FROM_PG_WAL source can provide next WAL. As say xlog.c comment [1]:

Right..

> > * We just successfully read a file in pg_wal. We prefer files in
> > * the archive over ones in pg_wal, so try the next file again
> > * from the archive first.
>
> We have some actual reason why we prefer restore_command instead of using local wal files first?

Yes, as discussed in the comments mentioned up-thread.

> Partially written WAL? Streaming replication can leave partially written WAL and we can handle this correctly.

Sure, though even in that case there seems to be a reasonable use-case
here for an option to control if restore_command is used to get the next
needed WAL or if the primary should be asked for the WAL first.

There's still a question here, at least from my perspective, as to which
is actually going to be faster to perform recovery based off of. A good
restore command, which pre-fetches the WAL in parallel and gets it local
and on the same filesystem, meaning that the restore_command only has to
execute essentially a 'mv' and return back to PG for the next WAL file,
is really rather fast, compared to streaming that same data over the
network with a single TCP connection to the primary. Of course, there's
a lot of variables there and it depends on the network speed between the
various pieces, but I've certainly had cases where a replica catches up
much faster using restore command than streaming from the primary.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2018-07-31 18:51:12 patch to ensure logical decoding errors early
Previous Message Michael Paquier 2018-07-31 17:47:32 Re: Documentaion fix.