Re: ERROR: invalid page in block 1226710 of relation base/16750/27244

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: bricklen <bricklen(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: invalid page in block 1226710 of relation base/16750/27244
Date: 2015-10-21 18:46:24
Message-ID: 22806.1445453184@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

bricklen <bricklen(at)gmail(dot)com> writes:
> We have run into some corruption in one of our production tables. We know
> the cause (a compute node was moved), but now we need to fix the data. We
> have backups, but at this point they are nearly a day old, so recovering
> from them is a last-resort and will incur significant downtime.
> We are running 9.3.9

> Following the steps at
> http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

> I get the following output for ctid, id, other_id, tstamp:

> (690651,42) | 318698967 | 347978007 | 2015-10-20 01:55:41.757+00
> (690651,43) | 318698968 | 347978008 | 2015-10-20 01:55:41.663+00
> (690651,44) | 318698969 | 347978009 | 2015-10-20 01:55:42.005+00
> ERROR: invalid page in block 1226710 of relation base/16750/27244

> It appears 690652 is what would be dd'd if that's the route we take. Is
> that accurate?

I'm confused by the block mentioned in the error message not having
anything to do with the TID sequence. I wonder whether it refers to an
index not the table proper. What query were you using to get this output,
exactly? Have you confirmed which relation has relfilenode 27244?

> Because the message indicates the corruption is in the table's page, not
> the page header, according to the docs zero_damaged_pages probably won't
> work.

I should think that zero_damaged_pages would work fine, if the problem
is indeed in the base table. But if it's in an index, a REINDEX would be
a better answer.

Another thing to keep in mind here, if you've got replication slaves,
is that I'm not sure whether the effects of zero_damaged_pages would
propagate to slaves. Have you investigated the possibility that some
slave has an uncorrupt copy that you could dd into place in the master?

> Is this the correct command if option #2 is chosen? Can it be executed
> against a running cluster?
> dd if=/dev/zero of=database/16750/27244 bs=8192 seek=690652 count=1
> conv=notrunc

Uh, no, you're not accounting for the fact that such an offset wouldn't be
in the first segment file of the relation.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-10-21 18:50:39 Re: temporary indexes?
Previous Message Jonathan Vanasco 2015-10-21 18:43:56 temporary indexes?