Too strict check when starting from a basebackup taken off a standby

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: eshkinkot(at)gmail(dot)com
Subject: Too strict check when starting from a basebackup taken off a standby
Date: 2014-12-11 03:45:01
Message-ID: 20141211034501.GA8139@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A customer recently reported getting "backup_label contains data
inconsistent with control file" after taking a basebackup from a standby
and starting it with a typo in primary_conninfo.

When starting postgres from a basebackup StartupXLOG() has the follow
code to deal with backup labels:
if (haveBackupLabel)
{
ControlFile->backupStartPoint = checkPoint.redo;
ControlFile->backupEndRequired = backupEndRequired;

if (backupFromStandby)
{
if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY)
ereport(FATAL,
(errmsg("backup_label contains data inconsistent with control file"),
errhint("This means that the backup is corrupted and you will "
"have to use another backup for recovery.")));
ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
}
}

while I'm not enthusiastic about the error message, that bit of code
looks sane at first glance. We certainly expect the control file to
indicate we're in recovery. Since we're unlinking the backup label
shortly afterwards we'd normally not expect to hit that case after a
shutdown in recovery.

The problem is that after reading the backup label we also have to read
the corresponding checkpoing from pg_xlog. If primary_conninfo and/or
restore_command are misconfigured and can't restore files that can only
be fixed by shutting down the cluster and fixing up recovery.conf -
which sets DB_SHUTDOWNED_IN_RECOVERY in the control file.

The easiest solution seems to be to simply also allow that as a state in
the above check. It might be nicer to not allow a ShutdownXLOG to modify
the control file et al at that stage, but I think that'd end up being
more invasive.

A short search shows that that also looks like a credible explanation
for #12128...

I plan to relax that check unless somebody comes up with a different &
better plan.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-12-11 03:57:30 Re: logical column ordering
Previous Message Etsuro Fujita 2014-12-11 03:09:01 Re: inherit support for foreign tables