Re: bug of recovery?

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug of recovery?
Date: 2011-09-30 06:57:02
Message-ID: CA+U5nMK+S+bqZwEpwQUaVoE+gzqspwq8rp-6TQPFP77kyGS1NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 30, 2011 at 2:09 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Sep29, 2011, at 13:49 , Simon Riggs wrote:
>>> This worries me slightly now though because the patch makes us PANIC
>>> in a place we didn't used to and once we do that we cannot restart the
>>> server at all. Are we sure we want that? It's certainly a great way to
>>> shake down errors in other code...
>>
>> The patch only introduces a new PANIC condition during archive recovery,
>> though. Crash recovery is unaffected, except that we no longer create
>> restart points before we reach consistency.
>>
>> Also, if we hit an invalid page reference after reaching consistency,
>> the cause is probably either a bug in our recovery code, or (quite unlikely)
>> a corrupted WAL that passed the CRC check. In both cases, the likelyhood
>> of data-corruption seems high, so PANICing seems like the right thing to do.
>
> Fair enough.
>
> We might be able to use FATAL or ERROR instead of PANIC because they
> also cause all processes to exit when the startup process emits them.
> For example, we now use FATAL to stop the server in recovery mode
> when recovery is about to end before we've reached a consistent state.

I think we should issue PANIC if the source is a critical rmgr, or
just WARNING if from a non-critical rmgr, such as indexes.

Ideally, I think we should have a mechanism to allow indexes to be
marked corrupt. For example, a file that if present shows that the
index is corrupt and would be marked not valid. We can then create the
file and send a sinval message to force the index relcache to be
rebuilt showing valid set to false.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-09-30 07:18:53 Re: [REVIEW] pg_last_xact_insert_timestamp
Previous Message Kyotaro HORIGUCHI 2011-09-30 02:24:43 Re: [REVIEW] pg_last_xact_insert_timestamp