Re: Return pg_control from pg_backup_stop().

From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Haibo Yan <tristan(dot)yim(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2026-03-18 04:05:24
Message-ID: 1800c83c-264a-4183-9da5-ac78e25849a8@pgbackrest.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/26 08:43, Michael Paquier wrote:
> On Tue, Mar 17, 2026 at 11:50:29AM -0700, Haibo Yan wrote:
>> Thank you for the clarification. I have now read the code, and
>> overall it looks good to me. I only had one very small comment.
>
> (Bottom-posting note from above, please be careful.)
>
>> I was just wondering whether the following might be slightly clearer:
>> ```
>> memset(controlFile, 0, PG_CONTROL_FILE_SIZE);
>> memcpy(controlFile, ControlFile, sizeof(ControlFileData));
>> ```

Yeah, perhaps I am being too clever here. The reason why I like this
pattern:

memset(controlFile + sizeof(ControlFileData), 0,
PG_CONTROL_FILE_SIZE - sizeof(ControlFileData));

...is that valgrind will complain if the ControlFileData part is not
completely initialized later on.

But your version is what is generally used in the code so I'm fine with
doing it that way.

> {
> ControlFile->backupStartPoint = checkPoint.redo;
> ControlFile->backupEndRequired = backupEndRequired;
> + ControlFile->backupLabelRequired = false;
>
> It sounds like it is going to be important to document the reason why
> the flag is reset here (aka we don't need the backup_label file
> anymore).

Good point -- I'll add that in the next revision.

> +backup_control_file(uint8_t *controlFile)
> +{
> + ControlFileData *controlData = ((ControlFileData *)controlFile);
> +
> + memset(controlFile + sizeof(ControlFileData), 0,
> + PG_CONTROL_FILE_SIZE - sizeof(ControlFileData));
> +
> + LWLockAcquire(ControlFileLock, LW_SHARED);
> + memcpy(controlFile, ControlFile, sizeof(ControlFileData));
> + LWLockRelease(ControlFileLock);
> +
> + controlData->backupLabelRequired = true;
> +
> + INIT_CRC32C(controlData->crc);
> + COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc));
> + FIN_CRC32C(controlData->crc);
>
> I was wondering if we should have an assertion at least to cross-check
> that the contents we store in shared memory never go out-of-sync with
> the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that
> calls get_controlfile() and memcmp()'s the contents between shmem and
> the on-disk file, while the LWLock is taken. We ship the control file
> last on purpose, one reason being backups taken from standbys, so that
> may be sensible to do.

As far as I can see this should always be true -- I audited all the

LWLockAcquire(ControlFileLock, LW_EXCLUSIVE)

sections and the file is always saved once if is updated. Let me see if
I can add this check without too much pain, e.g. an additional parameter.

> Another property of the new control file flag that is implied in the
> implementation but not documented is that we should never check for
> backupLabelRequired when a backup_label is gone.

I'm not sure what you mean here? That's exactly when we do want to check
as below:

/*
* No backup_label file has been found if we are here. Error if the
* control file requires backup_label.
*/
if (ControlFile->backupLabelRequired)
ereport(FATAL,
(errmsg("could not find backup_label required for recovery"),
errhint("backup_label must be present for recovery to
proceed")));

> Actually, the flag> is reset in InitWalRecovery() in the initial
steps of recovery, and
> the backup_label file is removed much later in StartupXLOG() just
> *after* UpdateControlFile() to minimize the window where the contents
> of the control file and the backup_label file is removed are
> out-of-sync. This window means that if we crash between the
> completion of UpdateControlFile() and the durable_rename() we could
> have a flag reset with a backup_label still around. On restart,
> recovery would fail, requiring a manual modification of the control
> file, at least. It sounds to me that this implementation detail
> should be documented clearly.

I'll test it but I don't think this is the case. If backup_label is
present then we'll just update pg_control again as we do now.

When backup_label is present the value of backupLabelRequired does not
matter so it just follows the prior logic.

> Finally, here is a general opinion. I like this patch, and it is
> basically risk-free for base backups taken with the replication
> protocol as we update the control file with the new flag set
> on-the-fly.

Glad to hear it!

> Now, I am worried about backups that use pg_stop_backup(). Changing
> backup APIs has always been a very sensitive area, and this is going
> to require operators to update backup tools so as the control file
> received as a result of pg_stop_backup() is copied, at the cost of
> getting a failure if they don't do so.

Using the pg_control copy from pg_backup_stop() is entirely optional and
nothing is broken if vendors decide not to use it. This can be
demonstrated by applying the 01 patch without 02. In that case the tests
in 042_low_level_backup continue to run. You can also apply 01 and 02
and revert the test changes in 042_low_level_backup and that works, too.

Vendors can use the new feature if they want the protection of
backupLabelRequired and a guaranteed non-torn copy of pg_control but if
they do nothing then nothing breaks.

> I will *not* proceed with this
> change without a clear approval from some more committers or senior
> hackers that they like this change (approach previously suggested by
> Andres, actually, for what I can see).

You are correct and this was an omission on my part. If this gets to
commit we'll definitely want to mention that this flag was Andres'
suggestion.

> I am adding in CC a few
> committers who have commented on this set of proposals and who have
> touched the recovery code in the last few years, for awareness.
> The timing is what it is, and we are at the end of a release cycle.
> Let's see if we can reach a consensus of some kind.

Perfect. I'm looking forward to their input.

I'll hold off on a new patch version until we get some feedback since I
don't think any of the requested changes are critical to the functionality.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-03-18 04:10:24 Re: Skipping schema changes in publication
Previous Message Bertrand Drouvot 2026-03-18 03:57:48 Re: relfilenode statistics