| From: | David Steele <david(at)pgbackrest(dot)org> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
| Subject: | Re: Return pg_control from pg_backup_stop(). |
| Date: | 2026-03-18 07:35:47 |
| Message-ID: | 3b23e3b7-53d2-4784-b482-05cca3327acb@pgbackrest.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 3/18/26 11:53, Michael Paquier wrote:
> On Wed, Mar 18, 2026 at 04:05:24AM +0000, David Steele wrote:
>> On 3/18/26 08:43, Michael Paquier wrote:
>>> I was wondering if we should have an assertion at least to cross-check
>>> that the contents we store in shared memory never go out-of-sync with
>>> the on-disk contents, in the shape of a USE_ASSERT_CHECKING block that
>>> calls get_controlfile() and memcmp()'s the contents between shmem and
>>> the on-disk file, while the LWLock is taken. We ship the control file
>>> last on purpose, one reason being backups taken from standbys, so that
>>> may be sensible to do.
>>
>> As far as I can see this should always be true -- I audited all the
>>
>> LWLockAcquire(ControlFileLock, LW_EXCLUSIVE)
>>
>> sections and the file is always saved once if is updated. Let me see if I
>> can add this check without too much pain, e.g. an additional parameter.
>
> This matches with my reads of the code. The attached check, that can
> be applied on top of your patches, passes under check-world.
Looks good to me. I have integrated it into the attached patches. I just
changed it to look at our copy instead of the global ControlData since
that is the one we are going to save.
>> Using the pg_control copy from pg_backup_stop() is entirely optional and
>> nothing is broken if vendors decide not to use it. This can be demonstrated
>> by applying the 01 patch without 02. In that case the tests in
>> 042_low_level_backup continue to run. You can also apply 01 and 02 and
>> revert the test changes in 042_low_level_backup and that works, too.
>
> FWIW, after a second look I am actually wondering if 0002 is safe at
> all. The contents of the control file are fetched after we are done
> with do_pg_backup_stop(), and there could be a bunch of activity that
> happens between the end of do_pg_backup_stop() and the
> backup_control_file() call, where the control file could be updated
> more and interfere with the recovery startup for some of its fields?
> GUC parameter updates that may touch the control file are one thing
> popping into mind.
You are correct -- the copy of pg_control needs to happen before
do_pg_backup_stop(). An older version of this patch saved pg_control in
backup_state which made the prior location safe. However, I missed
moving this code when I moved pg_control out of backup_state. Code
review to the rescue.
I believe I have addressed all current review comments in the attached
patches. I tested Postgres crashing right after pg_control is updated
but before backup_label is renamed. It worked as expected on restart. I
also manually removed backup_label before restarting and that worked as
well. I wrote a comment documenting all that.
One thing I could do is note in the documentation that it is not
strictly necessary to get pg_control from pg_backup_stop(). Right now it
sounds like copying from disk is no longer an option -- but it is if you
are willing to accept the possibility of pg_control being torn. But I'd
hope most well-maintained backup software is taking care of that by now.
The problem with caveats in the docs is it can lead to confusion and
getting pg_control from pg_backup_stop() is just a better idea in general.
Thoughts?
Regards,
-David
| Attachment | Content-Type | Size |
|---|---|---|
| pgcontrol-flag-v9-01-basebackup.patch | text/plain | 12.4 KB |
| pgcontrol-flag-v9-02-sql.patch | text/plain | 9.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Imran Zaheer | 2026-03-18 07:43:37 | Re: [WIP] Pipelined Recovery |
| Previous Message | Peter Eisentraut | 2026-03-18 07:29:09 | Re: SQL Property Graph Queries (SQL/PGQ) |