Re: Add exclusive backup deprecation notes to documentation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: David Steele <david(at)pgmasters(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Martín Marqués <martin(at)2ndquadrant(dot)com>
Subject: Re: Add exclusive backup deprecation notes to documentation
Date: 2019-03-20 14:31:46
Message-ID: CA+TgmoasYFa5AVJtVup2AkWXXNr7uaH22Vo3TNzSAEvOKeAVkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
> > Please note that there have been objections to the patch later in this
> > thread by Peter and Robert. I'm not very interested in watering down the
> > documentation changes as Peter suggests, but I think at the very least we
> > should commit the added hints in the error message. For many users this
> > error will be their first point of contact with the backup_label
> > issue/behavior.
>
> The updates of the log message do not imply anything negative as I
> read them as they mention to not remove the backup_label file. So
> while we don't have an agreement about the docs, the log messages may
> be able to be committed? Peter? Robert?
>
> "will result in a corruptED cluster" is more correct?

I really like the proposed changes to the ereport() text. I think the
"Be careful" hint is a really helpful way of phrasing it. I think
"corrupt" as the patch has it is slightly better than "corrupted".
Obviously, we have to make the updates for recovery.signal no matter
what, and you could argue that part should be its own commit, but I
like all of the changes.

I'm not too impressed with the documentation changes. A lot of the
information being added is already present somewhere in that very same
section. It's reasonable to revise the section so that the dangers
stand out more clearly, but it doesn't seem good to revise it in a way
that ends up duplicating the existing information. Here's my
suggestion -- ditch the note at the end of the section and make the
one at the beginning read like this:

The exclusive backup method is deprecated and should be avoided.
Prior to <productname>PostgreSQL</productname> 9.6, this was the only
low-level method available, but it is now recommended that all users
upgrade their scripts to use non-exclusive backups.

Then, revise the first paragraph like this:

The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of
backup can only be taken on a primary and does not allow concurrent
backups. Moreover, because it writes a backup_label file on the
master, it can cause the master to fail to restart automatically after
a crash. On the other hand, the erroneous removal of a backup_label
file from a backup or standby is a common mistake which can can result
in serious data corruption. If it is necessary to use this method,
the following steps may be used.

Later, where it says:

Note that if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal>
file has been
manually deleted from the <envar>PGDATA</envar> directory.

Change it to read:

As noted above, if the server crashes during the backup it may not be
possible to restart until the <literal>backup_label</literal> file has
been manually deleted from the <envar>PGDATA</envar> directory. Note
that it is very important to never to remove the
<literal>backup_label</literal> file when restoring a backup, because
this will result in corruption. Confusion about the circumstances
under which it is appropriate to remove this file is a common cause of
data corruption when using this method; be very certain that you
remove the file only on an existing master and never when building a
standby or restoring a backup, even if you are building a standby that
will subsequently be promoted to a new master.

I also think we should revise this thoroughly terrible advice:

If you wish to place a time limit on the execution of
<function>pg_stop_backup</function>, set an appropriate
<varname>statement_timeout</varname> value, but make note that if
<function>pg_stop_backup</function> terminates because of this your backup
may not be valid.

That seems awful, not only because it encourages people to do that
particular thing and end up leaving the server in backup mode, but
also because it doesn't clearly articulate the extreme importance of
making sure that the server is not left in backup mode. So I would
propose that we strike that text entirely and replace it with
something like:

When using exclusive backup mode, it is absolutely imperative to make
sure that <function>pg_stop_backup</function> completes successfully
at the end of the backup. Even if the backup itself fails, for
example due to lack of disk space, failure to call
<function>pg_stop_backup</function> will leave the server in backup
mode indefinitely, causing future backups to fail and increasing the
risk of a restart during a time when <literal>backup_label</literal>
exists.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2019-03-20 14:39:27 Re: PostgreSQL pollutes the file system
Previous Message Jesper Pedersen 2019-03-20 14:25:05 Re: speeding up planning with partitions