Re: Return pg_control from pg_backup_stop().

From: David Steele <david(at)pgbackrest(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Haibo Yan <tristan(dot)yim(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Return pg_control from pg_backup_stop().
Date: 2026-03-18 07:35:47
Message-ID: 3b23e3b7-53d2-4784-b482-05cca3327acb@pgbackrest.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/26 11:53, Michael Paquier wrote:
> On Wed, Mar 18, 2026 at 04:05:24AM +0000, David Steele wrote:
>> On 3/18/26 08:43, Michael Paquier wrote:
>>> I was wondering if we should have an assertion at least to cross-check
>>> that the contents we store in shared memory never go out-of-sync with
>>> the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that
>>> calls get_controlfile() and memcmp()'s the contents between shmem and
>>> the on-disk file, while the LWLock is taken. We ship the control file
>>> last on purpose, one reason being backups taken from standbys, so that
>>> may be sensible to do.
>>
>> As far as I can see this should always be true -- I audited all the
>>
>> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE)
>>
>> sections and the file is always saved once if is updated. Let me see if I
>> can add this check without too much pain, e.g. an additional parameter.
>
> This matches with my reads of the code. The attached check, that can
> be applied on top of your patches, passes under check-world.

Looks good to me. I have integrated it into the attached patches. I just
changed it to look at our copy instead of the global ControlData since
that is the one we are going to save.

>> Using the pg_control copy from pg_backup_stop() is entirely optional and
>> nothing is broken if vendors decide not to use it. This can be demonstrated
>> by applying the 01 patch without 02. In that case the tests in
>> 042_low_level_backup continue to run. You can also apply 01 and 02 and
>> revert the test changes in 042_low_level_backup and that works, too.
>
> FWIW, after a second look I am actually wondering if 0002 is safe at
> all. The contents of the control file are fetched after we are done
> with do_pg_backup_stop(), and there could be a bunch of activity that
> happens between the end of do_pg_backup_stop() and the
> backup_control_file() call, where the control file could be updated
> more and interfere with the recovery startup for some of its fields?
> GUC parameter updates that may touch the control file are one thing
> popping into mind.

You are correct -- the copy of pg_control needs to happen before
do_pg_backup_stop(). An older version of this patch saved pg_control in
backup_state which made the prior location safe. However, I missed
moving this code when I moved pg_control out of backup_state. Code
review to the rescue.

I believe I have addressed all current review comments in the attached
patches. I tested Postgres crashing right after pg_control is updated
but before backup_label is renamed. It worked as expected on restart. I
also manually removed backup_label before restarting and that worked as
well. I wrote a comment documenting all that.

One thing I could do is note in the documentation that it is not
strictly necessary to get pg_control from pg_backup_stop(). Right now it
sounds like copying from disk is no longer an option -- but it is if you
are willing to accept the possibility of pg_control being torn. But I'd
hope most well-maintained backup software is taking care of that by now.

The problem with caveats in the docs is it can lead to confusion and
getting pg_control from pg_backup_stop() is just a better idea in general.

Thoughts?

Regards,
-David

Attachment Content-Type Size
pgcontrol-flag-v9-01-basebackup.patch text/plain 12.4 KB
pgcontrol-flag-v9-02-sql.patch text/plain 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imran Zaheer 2026-03-18 07:43:37 Re: [WIP] Pipelined Recovery
Previous Message Peter Eisentraut 2026-03-18 07:29:09 Re: SQL Property Graph Queries (SQL/PGQ)