Re: reassure me that it's good to copy pg_control last in a base backup

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reassure me that it's good to copy pg_control last in a base backup
Date: 2017-12-22 05:29:27
Message-ID: 20171222052927.GA15816@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 21, 2017 at 10:48:49PM -0500, Chapman Flack wrote:
> From that description alone, I'd imagine a danger in redoing from a
> base backup in which pg_control was copied last. What if another
> checkpoint was made (after the one done by pg_start_backup) during
> the course of the backup, and the late-copied pg_control refers to
> it, but some of the files had been copied into the base backup
> too early to reflect it?

As long as you have a backup_label file to guarantee the start position
of recovery, that's not something to worry about. What would be bad is
to remove the backup_label file from a backup, which exposes you to
risks of corrupting an instance. This description stands for crash
recovery, where there is no backup_label file. Now you see why the
exclusive backup API can lead to problems? Imagine the case where
you take a exclusive backup and the instance from which a backup is
taken crashes, *with* a backup_label file on disk. Oops. That's one
reason behind non-exclusive backups, which is what pg_basebackup
uses as well.

> Looking harder, I think I see that the special care to grab
> pg_control last was introduced for the case of taking a base backup
> from a standby, and perhaps only matters in that case. The long
> discussion seems to be this one:
>
> https://www.postgresql.org/message-id/201108050646.p756kHC5023570%40ccmds32.silk.ntts.co.jp

Copying pg_control last in the backup matters only for bcakups taken from
standbys where you want to maximize the LSN position for minRecoveryPoint
so as you have a minimum amount of risks to face inconsistent data at
recovery. When taking a backup from a primary server, the WAL record
marking the end of the backup holds as guarantee that a consistent point
has been reached, so it does not matter to copy the control file first
or last in this case.

> What I think I've gleaned is:
>
> 1. The description in the doc ("at the start of recovery, the server
> first reads pg_control and the checkpoint record") only applies to
> the kind of recovery that happens in an unexpected restart, using
> the files that are present; it's not the whole story for the kind
> of recovery that begins with a base backup.

Yes, that's a crash recovery. But see the case I just described above
of an instance that crashing while an exclusive backup is running.

> 2. In the case of recovery from a backup (that was taken from a master),
> both the start and end location in pg_control are disregarded, in
> favor of the backup label file and the backup end WAL record,
> respectively, so it doesn't matter a whit whether pg_control was
> copied early or late.

Yes.

> 3. In recovery from a backup taken from a standby, there is a backup
> label file but no backup end WAL record, so the 'minimum recovery
> ending location' in pg_control has to be used, and that's why the
> fuss about copying pg_control last when backing up from a standby.

Yes.

> Did I get that right? If so, would it be worth adding some words
> to that paragraph in "WAL Internals", to clarify that the pg_control
> checkpoint position is not relied on when starting recovery with
> a backup label present, and therefore it isn't scary to copy pg_control
> late in the backup?

I would be interested in seeing a patch about that, people tend to
remove backup_label files too easily, so hardening the documentation
a bit could be an idea to dig into.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-12-22 05:30:56 Fix a typo in autoprewarm.c
Previous Message Kyotaro HORIGUCHI 2017-12-22 05:13:52 Re: autoprewarm is fogetting to register a tranche.