Re: pg_stop_backup(wait_for_archive := true) on standby server

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stop_backup(wait_for_archive := true) on standby server
Date: 2017-08-05 16:07:16
Message-ID: CAB7nPqSkcXT31YWbfJHikZFsT30MmYGJt045TewQLa1k=ynykw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 5, 2017 at 4:51 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Aug 5, 2017 at 9:11 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If we apply these patches to 9.6, then pg_stop_backup() on a standby
>> will start writing backup history files and removing no-longer-needed
>> backup history files. That's a clear behavior change, and it isn't a
>> bug fix. Making the waitforarchive option work is a bug fix, but I'm
>> not convinced we should take it as a license to change other aspects
>> of the behavior in a point release.
>
> After refreshing my memory further, I take it back. pg_stop_backup()
> doesn't even have a second argument on v9.6, so back-porting this fix
> to 9.6 is a meaningless thing; there's nothing to fix. What the
> patches propose to do there instead is adopt the behavior proposed for
> v10 when pg_stop_backup()'s second argument is true as the
> unconditional behavior in v9.6. For that to be the right thing to do,
> we have to decide that the current v9.6 behavior is a bug. But I
> (still) think that's very arguable, because the whole point is that
> we've just finished adding a flag in v10 to disable on the master the
> *very same behavior* we're proposing to mandate on the standby in
> v9.6. How can we say that v9.6's current behavior is a bug when v10
> produces the same behavior with wait_for_archive = false? That just
> doesn't make any sense.

Because default values should be safe in the backup and restore area,
and wait_for_archive = false is not the default. I would like to point
out that the 9.6 behavior has been discussed as being a bug upthread
for 9.6 by three people (David, Sawada-san and I) as there is a real
risk to take inconsistent backups from standbys (a WAL segment may not
be archived when pg_stop_backup reports back, so the user may not have
all the WAL it would need to get back to a consistent state), and that
the default should be to get consistent and safe backups.

> I've pushed a minimal fix for v10 which should address Sawada-san's
> original complaint: now, if you say wait_for_archive = true on a
> standby, you'll get a warning if it didn't wait, or if archive_mode =
> always, it will wait.

Backup history files are around mainly for debugging purposes. While I
don't mind about the choice to not generate them on back-branches, the
inconsistency between primary and standbys in generating them is
really disturbing. We could have taken the occasion to address that
here as this is not invasive, but well... I do complain a lot about
keeping changes going to v10 non-invasive if possible. So no real
complain to do what's been done.

> I think the right thing to do about 9.6 is
> document the behavior; there's no problem here that a user can't work
> around by doing it right.

There are many ways for users to do it wrong in this area, that I am
of the opinion to give them safe defaults if we have a way to make
things work safe in the backend. And here we are talking about extra
checks to make sure that a WAL segment is correctly archived... I have
seen bugs lately in custom backup code which led to inconsistent
backups.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-05 16:56:46 Re: pg_stop_backup(wait_for_archive := true) on standby server
Previous Message Joe Conway 2017-08-05 15:36:03 git.postgresql.org (and other services) down