Re: Allow WAL information to recover corrupted pg_controldata

From: Cédric Villemain <cedric(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Subject: Re: Allow WAL information to recover corrupted pg_controldata
Date: 2012-06-15 20:49:50
Message-ID: 201206152249.50960.cedric@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
> > I guess my first question is: why do we need this? There are lots of
> > things in the TODO list that someone wanted once upon a time, but
> > they're not all actually important. Do you have reason to believe
> > that this one is? It's been six years since that email, so it's worth
> > asking if this is actually relevant.
>
> As far as I know the pg_control is not WAL protected, which means if it
> gets corrupt due
> to any reason (disk crash during flush, so written partially), it might
> lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

> So user can use pg_resetxlog to recover the database. Currently
> pg_resetxlog works on guessed values for pg_control.
> However this implementation can improve the logic that instead of guessing,
> it can try to regenerate the values from
> WAL.
> This implementation can allow better recovery in certain circumstances.
>
> > The deadline for patches for this CommitFest is today, so I think you
> > should target any work you're starting now for the NEXT CommitFest.
>
> Oh, I am sorry, as this was my first time I was not fully aware of the
> deadline.
>
> However I still seek your opinion whether it makes sense to work on this
> feature.
>
>
> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Friday, June 15, 2012 12:40 AM
> To: Amit Kapila
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Allow WAL information to recover corrupted
> pg_controldata
>
> On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
>
> wrote:
> > I am planning to work on the below Todo list item for this CommitFest
> > Allow WAL information to recover corrupted pg_controldata
> > http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php
>
> The deadline for patches for this CommitFest is today, so I think you
> should target any work you're starting now for the NEXT CommitFest.
>
> > I wanted to confirm my understanding about the work involved for this
>
> patch:
> > The existing patch has following set of problems:
> > 1. Memory leak and linked list code path is not proper
> > 2. lock check for if the server is already running, is removed in
> > patch which needs to be reverted
> > 3. Refactoring of the code.
> >
> > Apart from above what I understood from the patch is that its intention
> > is to generate values for ControlFile using WAL logs when -r option is
> > used.
> >
> > The change in algorithm from current will be if control file is corrupt
> > which essentialy means ReadControlFile() will return False, then it
> > should generate values (checkPointCopy, checkPoint, prevCheckPoint,
> > state) using WAL if -r option is enabled.
> >
> > Also for -r option, it doesn't need to call function FindEndOfXLOG() as
>
> the
>
> > that work will be achieved by above point.
> >
> > It will just rewrite the control file and don’t do other resets.
> >
> >
> > The algorithm of restoring the pg_control value from old xlog file:
> > 1. Retrieve all of the active xlog files from xlog direcotry into a
>
> list
>
> > by increasing order, according their timeline, log id, segment id.
> > 2. Search the list to find the oldest xlog file of the lastest time
>
> line.
>
> > 3. Search the records from the oldest xlog file of latest time line to
> > the latest xlog file of latest time line, if the checkpoint record
> > has been found, update the latest checkpoint and previous
>
> checkpoint.
>
> > Apart from above some changes in code will be required after the Xlog
>
> patch
>
> > by Heikki.
> >
> > Suggest me if my understanding is correct?
>
> I guess my first question is: why do we need this? There are lots of
> things in the TODO list that someone wanted once upon a time, but
> they're not all actually important. Do you have reason to believe
> that this one is? It's been six years since that email, so it's worth
> asking if this is actually relevant.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-06-15 20:51:37 Re: sortsupport for text
Previous Message Dimitri Fontaine 2012-06-15 20:39:23 Re: Backup docs