Re: bug of recovery?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bug of recovery?
Date: 2011-09-27 00:49:51
Message-ID: CAHGQGwH2O=-1etYFzE10ftsOsJQEU_RbGcD8PuE4w5CSqkvMMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 27, 2011 at 7:28 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>> we (think we) have reached consistency, rather than leaving it to be
>> done only when we exit recovery mode.
>
> I believe we also need to prevent the creation of restart points before
> we've reached consistency. If we're starting from an online backup,
> and a checkpoint occurred between pg_start_backup() and pg_stop_backup(),
> we currently create a restart point upon replaying that checkpoint's
> xlog record. At that point, however, unresolved page references are
> not an error, since a truncation that happened after the checkpoint
> (but before pg_stop_backup()) might or might not be reflected in the
> online backup.

Preventing the creation of restartpoints before reaching consistent point
sounds fragile to the case where the backup takes very long time. It might
also take very long time to reach consistent point when replaying from that
backup. Which prevents also the removal of WAL files (e.g., streamed from
the master server) for a long time, and then might cause disk full failure.

ISTM that writing an invalid-page table to the disk for every restartpoints is
better approach. If an invalid-page table is never updated after we've
reached consistency point, we probably should make restartpoints write
that table only after that point. And, if a reference to an invalid
page is found
after the consistent point, we should emit error and cancel a recovery.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-09-27 00:57:40 Re: random isolation test failures
Previous Message Tom Lane 2011-09-27 00:43:45 Re: [BUGS] BUG #6218: TRAP: FailedAssertion( "!(owner->nsnapshots == 0)", File: "resowner.c", Line: 365)