Re: Detecting some cases of missing backup_label

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Detecting some cases of missing backup_label
Date: 2023-12-21 11:37:46
Message-ID: 20231221113746.7nf3jemf6nri7k72@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-12-20 13:11:37 -0400, David Steele wrote:
> I've run this through a bunch of scenarios (in my head) with parallel
> backups and it does seem to hold up.
>
> I think we'd need to write the state file before XLOG_BACKUP_START just in
> case. Seems better to have an extra state file rather than have one be
> missing.

That'd very significantly weaken the approach, afaict, because "external" base
base backup could end up copying those files. The whole point is to detect
broken procedures, so relying on such files being excluded from the base
backup seems like a bad idea.

I also see no need to do so - because we'd only verify that a backup start has
been replayed when replaying XLOG_BACKUP_STOP there's no danger in not
creating the files during XLOG_BACKUP_START, but doing so just before logging
the XLOG_BACKUP_STOP.

> I'm a little worried about what happens if a state file goes missing, but I
> guess that could be true of any file in PGDATA.

Yea, that seems like a non-issue to me.

> Probably we'd want to exclude *all* state files from backups, though.

I don't think so - I think we want the opposite? As noted above, I think in a
safety net like this we shouldn't assume that backup procedures were followed
correctly.

> Seems like in various PITR scenarios it could be hard to determine when to
> remove them.

Why? I think we can basically remove the files when:

a) after the checkpoint during which XLOG_BACKUP_STOP was replayed - I think
we already have the infrastructure to queue file deletions that we can hook
into
b) when replaying a shutdown checkpoint / after creation of a shutdown
checkpoint

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-12-21 12:07:03 Re: trying again to get incremental backup
Previous Message Andres Freund 2023-12-21 11:27:41 Re: [PoC] Improve dead tuple storage for lazy vacuum