Re: Recovery bug

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Recovery bug
Date: 2010-10-18 08:02:49
Message-ID: AANLkTi=ffepY1tpduZBxLGBycg7NekF=KuacC4MW=Wwj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>> Send a SIGQUIT to the postmaster to simulate a crash. When you bring it
>> back up, it thinks it is recovering from a backup, so it reads
>> backup_label. The checkpoint for the backup label is in 00...6, so it
>> reads that just fine. But then it tries to read the WAL starting at the
>> redo location from that checkpoint, which is in 00...5 and it doesn't
>> exist and PANICs.
>>
>> Ordinarily you might say that this is just confusion over whether it's
>> recovering from a backup or not, and you just need to remove
>> backup_label and try again. But that doesn't work: at this point
>> StartupXLOG has already done two things:
>> 1. renamed the backup file to .old
>> 2. updated the control file

Good catch!

> I still think it would be nice if postgres knew whether it was restoring
> a backup or recovering from a crash, otherwise it's hard to
> automatically recover from failures. I thought about using the presence
> of recoveryRestoreCommand or PrimaryConnInfo to determine that. But it
> seemed potentially dangerous if the person restoring a backup simply
> forgot to set those, and then it tries restoring from the controldata
> instead (which is unsafe to do during a backup).

Yep, to automatically delete backup_label and continue recovery seems to be
dangerous. How about just emitting FATAL error when neither restore_command
nor primary_conninfo is supplied and backup_label exists? This seems to be
simpler than your proposed patch (i.e., check whether REDO location exists).

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sebastian Frey 2010-10-18 14:10:38 How do I remove PostgreSQL completely?
Previous Message Joel Lopes Da Silva 2010-10-18 06:31:10 BUG #5715: man pages missing after compiling PostgreSQL 9.0.1 sources on OS X 10.6