Re: trying to run PITR recovery

From: Warren Little <Warren(dot)Little(at)MeridiasCapital(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: trying to run PITR recovery
Date: 2007-03-30 14:23:06
Message-ID: 5F414061-9F03-4580-9CD7-1458BA942530@MeridiasCapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Simon,
I have no issues with how the error was handled, just the
notification that an error was encountered.

> @ 2007-03-23 05:57:33 MDTLOG: restored log file
> "000000010000011A000000FD" from archive
> @ 2007-03-23 05:57:35 MDTLOG: incorrect resource manager data
> checksum in record at 11A/FD492B20
> @ 2007-03-23 05:57:35 MDTLOG: redo done at 11A/FD492210

The first message says it restored the file, the second message looks
like an error, but for myself, who does this process very seldom, its
hard to tell what exactly transpired.

On slightly different topic, is there some way to determine the
timeline of the corrupted segment, ie what was the original time of
the last restored transaction.

On Mar 30, 2007, at 5:16 AM, Simon Riggs wrote:

> On Fri, 2007-03-23 at 17:16 -0600, Warren Little wrote:
>
>> My concern is that there were many more logfiles to be played
>> following "00000010000011A000000FD"
>> (ie 000000010000011E0000005C) yet it appears the recovery stop at
>> that
>> point and called it good.
>> I would assume all WAL logs would be restored.
>
> I'm interested in your feedback here. How would you like it to have
> acted?
>
> The WAL file was clearly corrupt.
>
> 1. Don't continue and don't come up. Have the recovery fail. In
> order to
> bring the server up, we would have to restart recovery with an
> additional command to say "I note that my recovery has failed and
> would
> like recovery to come up at the last possible point."
>
> 2. Attempt to continue after we fail the CRC check. This is both
> dangerous and in many cases won't work either, since this is one of
> the
> normal ending points.
>
> 3. Continue after a CRC check, don't attempt to apply the records,
> just
> look at them to determine if they look correct. i.e. see if the CRC
> error applies to just that record
>
> 4. Add a command to ignore specific WAL records
> ignore_record = '11A/FD492B20'
>
> These may also not work very well at all, since many records depend
> upon
> previous data changes, so could quickly end in further errors.
>
> What would you suggest?
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>

Warren Little
Chief Technology Officer
Meridias Capital Inc
ph 866.369.7763

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Carol Walter 2007-03-30 14:43:05 autovacuum question
Previous Message Simon Riggs 2007-03-30 13:31:58 Re: recovering using a continuous archive backup doesn'twork on Windows