Re: Use of backup_label not noted in log

From: Andres Freund <andres(at)anarazel(dot)de>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: Use of backup_label not noted in log
Date: 2023-11-18 18:01:42
Message-ID: 20231118180142.gaitteticpeccg5g@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-17 06:41:46 +0100, Laurenz Albe wrote:
> On Thu, 2023-11-16 at 20:18 -0800, Andres Freund wrote:
> > I've often had to analyze what caused corruption in PG instances, where the
> > symptoms match not having had backup_label in place when bringing on the
> > node. However that's surprisingly hard - the only log messages that indicate
> > use of backup_label are at DEBUG1.
> >
> > Given how crucial use of backup_label is and how frequently people do get it
> > wrong, I think we should add a LOG message - it's not like use of backup_label
> > is a frequent thing in the life of a postgres instance and is going to swamp
> > the log.  And I think we should backpatch that addition.
>
> +1
>
> I am not sure about the backpatch: it is not a bug, and we should not wantonly
> introduce new log messages in a minor release. Some monitoring system may
> get confused.

I think log monitoring need (and do) handle unknown log messages
gracefully. You're constantly encountering them. If were to change an
existing log message in the back branches it'd be a different story.

The reason for backpatching is that this is by far the most common reason for
corrupted systems in the wild that I have seen. And there's no way to
determine from the logs whether something has gone right or wrong - not really
a bug, but a pretty substantial weakness. And we're going to have to deal with
< 17 releases for 5 years, so making this at least somewhat diagnosable seems
like a good idea.

> What about adding it to the "redo starts at" message, something like
>
> redo starts at 12/12345678, taken from control file
>
> or
>
> redo starts at 12/12345678, taken from backup label

I think it'd make sense to log use of backup_label earlier than that - the
locations from backup_label might end up not being available in the archive,
the primary or locally, and we'll error out with "could not locate a valid
checkpoint record".

I'd probably just do it within the if (read_backup_label()) block in
InitWalRecovery(), *before* the ReadCheckpointRecord().

I do like the idea of expanding the "redo starts at" message
though. E.g. including minRecoveryLSN, ControlFile->backupStartPoint,
ControlFile->backupEndPoint would provide information about when the node
might become consistent.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-18 18:05:46 Re: Use of backup_label not noted in log
Previous Message Pavel Stehule 2023-11-18 17:28:53 Re: Schema variables - new implementation for Postgres 15