Re: pg_stop_backup(wait_for_archive := true) on standby server

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: pg_stop_backup(wait_for_archive := true) on standby server
Date: 2017-08-14 15:28:16
Message-ID: 20170814152816.GF4628@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert, Michael,

* Michael Paquier (michael(dot)paquier(at)gmail(dot)com) wrote:
> On Sun, Aug 6, 2017 at 3:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > All of (1)-(3) are legitimate user choices, although not everyone will
> > make them. (4) is unfortunately the procedure recommended by our
> > documentation, which is where the problem comes in. I think it's
> > pretty lame that the documentation recommends ignoring the return
> > value of pg_stop_backup(); ISTM that it would be very wise to
> > recommend cross-checking the return value against the WAL files you're
> > keeping for the backup. Even if we thought the waiting logic was
> > working perfectly in every case, having a cross-check on something as
> > important as backup integrity seems like an awfully good idea.
>
> I would got a little bit more far and say that this is mandatory as
> the minimum recovery point that needs to be reached is the LSN
> returned by pg_stop_backup(). For backups taken from primaries, this
> is a larger deal because XLOG_BACKUP_END may not be present if the
> last WAL segment switched is not in the archive. For backups taken
> from standbys, the thing is more tricky as the control file should be
> backed up last. I would think that the best thing we could do is to
> extend pg_stop_backup a bit more so as it returns the control file to
> write in the backup using a bytea to hold the data for example.

Indeed, getting this all correct isn't trivial and it's really
unfortunate that our documentation in this area is really lacking.
Further, having things not actually work as the documentation claims
isn't exactly helping.

> > I think the root of this problem is that commit
> > 7117685461af50f50c03f43e6a622284c8d54694 did only a pretty perfunctory
> > update of the documentation, as the commit message itself admits:
> >
> > Only reference documentation is included. The main section on
> > backup still needs
> > to be rewritten to cover this, but since that is already scheduled
> > for a separate
> > large rewrite, it's not included in this patch.
> >
> > But it doesn't look like that ever really got done.
> > https://www.postgresql.org/docs/10/static/continuous-archiving.html#backup-lowlevel-base-backup
> > really doesn't talk at all about standbys or differences in required
> > procedures between masters and standbys. It makes statements that are
> > flagrantly riduculous in the case of a standby, like claiming that
> > pg_start_backup() always performs a checkpoint and that pg_stop_backup
> > "terminates the backup mode and performs an automatic switch to the
> > next WAL segment." Well, obviously not.
>
> Yep.

Indeed, that was something that I had also heard was being worked on,
but it's really unfortunate that it didn't happen.

> > And at least to me, that's the real bug here. Somebody needs to go
> > through and fix this documentation properly.
>
> So, Magnus? :)

I continue to be off the opinion that rewriting the documentation in
this case to match what we actually do is pretty unfriendly to users who
have built tools using the documentation at the time. Evidently, that
ship has sailed, however, but we didn't help things here by only
changing how PG10 works and not also at least updating the documentation
to be accurate.

I've poked Magnus... more directly regarding this and offered to have
someone help with the development of better documentation in this area.
Hopefully we can get the docs in the back-branches fixed for the next
round of minor releases, and call out those updates in the release
notes, at least.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-08-14 15:40:12 Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Previous Message Peter Eisentraut 2017-08-14 15:17:12 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values