Re: bug of recovery?

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug of recovery?
Date: 2011-09-29 11:49:11
Message-ID: CA+U5nMKDoseVh4usTz7B-N_XCFq-JHMFy2Q6=epy2udsS=e+yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
>>> On 27.09.2011 00:28, Florian Pflug wrote:
>>>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>>>> we (think we) have reached consistency, rather than leaving it to be
>>>>> done only when we exit recovery mode.
>>>>
>>>> I believe we also need to prevent the creation of restart points before
>>>> we've reached consistency.
>>>
>>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency, we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table.
>>
>> That mimics the way the rm_safe_restartpoint callbacks work, which is good.
>>
>> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need to create one that checks whether invalid_page_tab is empty.
>
> Okay, the attached patch prevents the creation of restartpoints by using
> rm_safe_restartpoint callback if we've not reached a consistent state yet
> and the invalid-page table is not empty. But the invalid-page table is not
> tied to the specific resource manager, so using rm_safe_restartpoint for
> that seems to slightly odd. Is this OK?
>
> Also, according to other suggestions, the patch changes XLogCheckInvalidPages()
> so that it's called as soon as we've reached a consistent state, and changes
> log_invalid_page() so that it emits PANIC immediately if consistency is already
> reached. These are very good changes, I think. Because they enable us to
> notice serious problem which causes PANIC error immediately. Without these
> changes, you unfortunately might notice that the standby database is corrupted
> when failover happens. Though such a problem might rarely happen, I think it's
> worth doing those changes.

Patch does everything we agreed it should.

Good suggestion from Florian.

This worries me slightly now though because the patch makes us PANIC
in a place we didn't used to and once we do that we cannot restart the
server at all. Are we sure we want that? It's certainly a great way to
shake down errors in other code...

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-09-29 11:52:54 Re: pg_upgrade - add config directory setting
Previous Message Fujii Masao 2011-09-29 11:31:11 Re: bug of recovery?