Re: Allow WAL information to recover corrupted pg_controldata

From: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
To: Cédric Villemain <cedric(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Subject: Re: Allow WAL information to recover corrupted pg_controldata
Date: 2012-06-16 05:58:42
Message-ID: 6C0B27F7206C9E4CA54AE035729E9C382850626B@szxeml509-mbx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> > > I guess my first question is: why do we need this? There are lots of
> > > things in the TODO list that someone wanted once upon a time, but
> > > they're not all actually important. Do you have reason to believe
> > > that this one is? It's been six years since that email, so it's worth
> > > asking if this is actually relevant.
>
>> As far as I know the pg_control is not WAL protected, which means if it
>> gets corrupt due
>> to any reason (disk crash during flush, so written partially), it might
>> lead to failure in recovery of database.

> AFAIR pg_controldata fit on a disk sector so it can not be half written.
It can be corrupt due to some other reasons as well like torn disk sector.
As already pg_resetxlog has a mechanism to recover corrupt pg_control file, so it is already considered that it can be corrupt in some case.
The suggested patch improves the logic to recover corrupt control file. So that is the reason I felt it will be relevant to do this patch.
________________________________________
From: Cédric Villemain [cedric(at)2ndquadrant(dot)com]
Sent: Saturday, June 16, 2012 2:19 AM
To: pgsql-hackers(at)postgresql(dot)org
Cc: Amit kapila; 'Robert Haas'
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
> > I guess my first question is: why do we need this? There are lots of
> > things in the TODO list that someone wanted once upon a time, but
> > they're not all actually important. Do you have reason to believe
> > that this one is? It's been six years since that email, so it's worth
> > asking if this is actually relevant.
>
> As far as I know the pg_control is not WAL protected, which means if it
> gets corrupt due
> to any reason (disk crash during flush, so written partially), it might
> lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

> So user can use pg_resetxlog to recover the database. Currently
> pg_resetxlog works on guessed values for pg_control.
> However this implementation can improve the logic that instead of guessing,
> it can try to regenerate the values from
> WAL.
> This implementation can allow better recovery in certain circumstances.
>
> > The deadline for patches for this CommitFest is today, so I think you
> > should target any work you're starting now for the NEXT CommitFest.
>
> Oh, I am sorry, as this was my first time I was not fully aware of the
> deadline.
>
> However I still seek your opinion whether it makes sense to work on this
> feature.
>
>
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Friday, June 15, 2012 12:40 AM
> To: Amit Kapila
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Allow WAL information to recover corrupted
> pg_controldata
>
> On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
>
> wrote:
> > I am planning to work on the below Todo list item for this CommitFest
> > Allow WAL information to recover corrupted pg_controldata
> > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php
>
> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.
>
> > I wanted to confirm my understanding about the work involved for this
>
> patch:
> > The existing patch has following set of problems:
> > 1. Memory leak and linked list code path is not proper
> > 2. lock check for if the server is already running, is removed in
> > patch which needs to be reverted
> > 3. Refactoring of the code.
> >
> > Apart from above what I understood from the patch is that its intention
> > is to generate values for ControlFile using WAL logs when -r option is
> > used.
> >
> > The change in algorithm from current will be if control file is corrupt
> > which essentialy means ReadControlFile() will return False, then it
> > should generate values (checkPointCopy, checkPoint, prevCheckPoint,
> > state) using WAL if -r option is enabled.
> >
> > Also for -r option, it doesn't need to call function FindEndOfXLOG() as
>
> the
>
> > that work will be achieved by above point.
> >
> > It will just rewrite the control file and don’t do other resets.
> >
> >
> > The algorithm of restoring the pg_control value from old xlog file:
> > 1. Retrieve all of the active xlog files from xlog direcotry into a
>
> list
>
> > by increasing order, according their timeline, log id, segment id.
> > 2. Search the list to find the oldest xlog file of the lastest time
>
> line.
>
> > 3. Search the records from the oldest xlog file of latest time line to
> > the latest xlog file of latest time line, if the checkpoint record
> > has been found, update the latest checkpoint and previous
>
> checkpoint.
>
> > Apart from above some changes in code will be required after the Xlog
>
> patch
>
> > by Heikki.
> >
> > Suggest me if my understanding is correct?
>
> I guess my first question is: why do we need this? There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important. Do you have reason to believe
> that this one is? It's been six years since that email, so it's worth
> asking if this is actually relevant.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit kapila 2012-06-16 06:04:43 Re: Resource Owner reassign Locks
Previous Message Noah Misch 2012-06-16 05:10:31 Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3