Re: Standby trying "restore_command" before local WAL

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, berge(at)trivini(dot)no, Gürkan Gür <ben(at)gurkan(dot)in>, Raimund Schlichtiger <raimund(dot)schlichtiger(at)innogames(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Bernhard Schrader <bernhard(dot)schrader(at)innogames(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>
Subject: Re: Standby trying "restore_command" before local WAL
Date: 2018-08-03 19:55:39
Message-ID: 20180803195539.GA20967@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 31, 2018 at 02:55:58PM +0200, Emre Hasegeli wrote:
> == The Workarounds ==
>
> We can possibly work around this inside the "restore_command" or
> by delaying the archiving. Working around inside the "restore_command"
> would involve checking whether the file exists under pg_wal/. This
> should not be easy because the WAL file may be written partially. It
> should be easier for Postgres to do this as it knows where to stop
> processing the local WAL.

It is also not that complicated to check if a WAL segment is properly
shaped by just running pg_waldump or such, so that would be fine for all
your cases with back-branches perhaps?

> == The Change ==
>
> This "restore_command" behavior is coming from the initial archiving
> and point-in-time-recovery implementation [2]. The code says
> "the reason is that the file in XLOGDIR could be an old, un-filled or
> partly-filled version that was copied and restored as part of
> backing up $PGDATA." This was probably a good reason in 2004, but
> I don't think it still is. AFAIK "pg_basebackup" eliminates this
> problem.

pg_basebackup is not the only backup solution, though I'd like that
folks use it more, it can be a bottleneck and comes with its own
limitations when streaming for example tar data with multiple
tablespaces for example still...

> Also, with this reasoning, we should also try streaming from the
> master before trying the local WAL, but AFAIU we don't.

... You have a point here, things are rather inconsistent by this
argument. I have not worked on that in details, but at least
WaitForWALToBecomeAvailable() which enforces XLOG_FROM_ARCHIVE when the
current source is XLOG_FROM_PG_WAL would need to be changed.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-08-03 20:37:05 Re: Documentaion fix.
Previous Message Andrew Dunstan 2018-08-03 19:13:42 pg_dumpall --exclude-database option