Re: Add exclusive backup deprecation notes to documentation

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Martín Marqués <martin(at)2ndquadrant(dot)com>
Subject: Re: Add exclusive backup deprecation notes to documentation
Date: 2019-03-20 17:08:53
Message-ID: 3e4899a6-131f-2e93-ca22-4e174bdc1963@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On 3/20/19 6:31 PM, Robert Haas wrote:
> On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
>>> Please note that there have been objections to the patch later in this
>>> thread by Peter and Robert. I'm not very interested in watering down the
>>> documentation changes as Peter suggests, but I think at the very least we
>>> should commit the added hints in the error message. For many users this
>>> error will be their first point of contact with the backup_label
>>> issue/behavior.
>>
>> The updates of the log message do not imply anything negative as I
>> read them as they mention to not remove the backup_label file. So
>> while we don't have an agreement about the docs, the log messages may
>> be able to be committed? Peter? Robert?
>>
>> "will result in a corruptED cluster" is more correct?
>
> I really like the proposed changes to the ereport() text. I think the
> "Be careful" hint is a really helpful way of phrasing it. I think
> "corrupt" as the patch has it is slightly better than "corrupted".
> Obviously, we have to make the updates for recovery.signal no matter
> what, and you could argue that part should be its own commit, but I
> like all of the changes.

Cool.

> I'm not too impressed with the documentation changes. A lot of the
> information being added is already present somewhere in that very same
> section. It's reasonable to revise the section so that the dangers
> stand out more clearly, but it doesn't seem good to revise it in a way
> that ends up duplicating the existing information.

OK.

> Here's my
> suggestion -- ditch the note at the end of the section and make the
> one at the beginning read like this:
>
> The exclusive backup method is deprecated and should be avoided.
> Prior to <productname>PostgreSQL</productname> 9.6, this was the only
> low-level method available, but it is now recommended that all users
> upgrade their scripts to use non-exclusive backups.
>
> Then, revise the first paragraph like this:
>
> The process for an exclusive backup is mostly the same as for a
> non-exclusive one, but it differs in a few key steps. This type of
> backup can only be taken on a primary and does not allow concurrent
> backups. Moreover, because it writes a backup_label file on the
> master, it can cause the master to fail to restart automatically after
> a crash. On the other hand, the erroneous removal of a backup_label
> file from a backup or standby is a common mistake which can can result
> in serious data corruption. If it is necessary to use this method,
> the following steps may be used.

This works for me as I feel like the cautions here (and below) are still
strongly worded. Peter?

> Later, where it says:
>
> Note that if the server crashes during the backup it may not be
> possible to restart until the <literal>backup_label</literal>
> file has been
> manually deleted from the <envar>PGDATA</envar> directory.
>
> Change it to read:
>
> As noted above, if the server crashes during the backup it may not be
> possible to restart until the <literal>backup_label</literal> file has
> been manually deleted from the <envar>PGDATA</envar> directory. Note
> that it is very important to never to remove the
> <literal>backup_label</literal> file when restoring a backup, because
> this will result in corruption. Confusion about the circumstances
> under which it is appropriate to remove this file is a common cause of
> data corruption when using this method; be very certain that you
> remove the file only on an existing master and never when building a
> standby or restoring a backup, even if you are building a standby that
> will subsequently be promoted to a new master.

Technically we are into repetition here, but I'm certainly OK with it as
this point bears repeating.

> I also think we should revise this thoroughly terrible advice:
>
> If you wish to place a time limit on the execution of
> <function>pg_stop_backup</function>, set an appropriate
> <varname>statement_timeout</varname> value, but make note that if
> <function>pg_stop_backup</function> terminates because of this your backup
> may not be valid.
>
> That seems awful, not only because it encourages people to do that
> particular thing and end up leaving the server in backup mode, but
> also because it doesn't clearly articulate the extreme importance of
> making sure that the server is not left in backup mode. So I would
> propose that we strike that text entirely and replace it with
> something like:
>
> When using exclusive backup mode, it is absolutely imperative to make
> sure that <function>pg_stop_backup</function> completes successfully
> at the end of the backup. Even if the backup itself fails, for
> example due to lack of disk space, failure to call
> <function>pg_stop_backup</function> will leave the server in backup
> mode indefinitely, causing future backups to fail and increasing the
> risk of a restart during a time when <literal>backup_label</literal>
> exists.

It's also pretty important that exclusive backups complete successfully
since backup_label is returned from pg_stop_backup() -- the backup will
definitely be corrupt without that if there was a checkpoint during the
backup. But, yeah, leaving a backup_label around for a long time
increases the restart risks a lot.

I'll revise the patch if Peter thinks this approach looks reasonable.

Thanks,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2019-03-20 17:25:30 Re: PostgreSQL pollutes the file system
Previous Message Alexander Kuzmenkov 2019-03-20 16:58:21 Re: Optimze usage of immutable functions as relation