Re: ERROR: invalid page in block 1226710 of relation base/16750/27244

From: bricklen <bricklen(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: invalid page in block 1226710 of relation base/16750/27244
Date: 2015-10-21 19:31:13
Message-ID: CAGrpgQ_y9W5fY2T26bofUE=ZJdiF6262=ReNTZWv=2dsxC3VEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Tom,

On Wed, Oct 21, 2015 at 11:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> bricklen <bricklen(at)gmail(dot)com> writes:
> > I get the following output for ctid, id, other_id, tstamp:
>
> > (690651,42) | 318698967 | 347978007 | 2015-10-20 01:55:41.757+00
> > (690651,43) | 318698968 | 347978008 | 2015-10-20 01:55:41.663+00
> > (690651,44) | 318698969 | 347978009 | 2015-10-20 01:55:42.005+00
> > ERROR: invalid page in block 1226710 of relation base/16750/27244
>
> > It appears 690652 is what would be dd'd if that's the route we take. Is
> > that accurate?
>
> I'm confused by the block mentioned in the error message not having
> anything to do with the TID sequence. I wonder whether it refers to an
> index not the table proper. What query were you using to get this output,
> exactly? Have you confirmed which relation has relfilenode 27244?
>

Yes, it is definitely a table. There was originally an index on that table
which threw the original error (about sibling mismatch). I dropped the
index and attempted to recreate it, which failed. Further investigation led
to discovery of corruption in the table.

>
> > Because the message indicates the corruption is in the table's page, not
> > the page header, according to the docs zero_damaged_pages probably won't
> > work.
>
> I should think that zero_damaged_pages would work fine, if the problem
> is indeed in the base table.
>

I will make note of that.

>
> Another thing to keep in mind here, if you've got replication slaves,
> is that I'm not sure whether the effects of zero_damaged_pages would
> propagate to slaves. Have you investigated the possibility that some
> slave has an uncorrupt copy that you could dd into place in the master?
>

We do have one uncorrupted slave, and one corrupted. I have a 4 hour
delayed WAL-apply script that runs on the primary slaves in the disaster
recovery data centres, and I stopped that process as soon as I saw the
error about the sibling mismatch on the master. It is a viable candidate to
fail over to, if we can swing a 20+ hour window of data loss. Right now
that is an undesirable option.

>
> > Is this the correct command if option #2 is chosen? Can it be executed
> > against a running cluster?
> > dd if=/dev/zero of=database/16750/27244 bs=8192 seek=690652 count=1
> > conv=notrunc
>
> Uh, no, you're not accounting for the fact that such an offset wouldn't be
> in the first segment file of the relation.
>

Hmm, I wasn't sure about that. Thanks for confirming that.

As it stands, my next step is going to be a pg_dump of one of the
up-to-date slaves (with corruption) but I will exclude the bad table. Given
that I know the PK id range, I can COPY out the table's contents before and
after the affected data. This way we can at least recover from backup if
things get entirely borked.

The next part of the plan is to create a temporary version of the table
with all data other than the corrupted range, then do some transaction-fu
to rename the tables.

Thank you for your response, and any other insights are gratefully received.

Cheers,

Bricklen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message melvin6925 2015-10-21 19:39:14 Re: temporary indexes?
Previous Message Jonathan Vanasco 2015-10-21 19:27:37 Re: temporary indexes?