Quick Links

Re: ERROR: invalid page in block 1226710 of relation base/16750/27244

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	bricklen <bricklen(at)gmail(dot)com>
Cc:	"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: ERROR: invalid page in block 1226710 of relation base/16750/27244
Date:	2015-10-21 18:46:24
Message-ID:	22806.1445453184@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

bricklen <bricklen(at)gmail(dot)com> writes:
> We have run into some corruption in one of our production tables. We know
> the cause (a compute node was moved), but now we need to fix the data. We
> have backups, but at this point they are nearly a day old, so recovering
> from them is a last-resort and will incur significant downtime.
> We are running 9.3.9

> Following the steps at
> http://blog.endpoint.com/2010/06/tracking-down-database-corruption-with.html

> I get the following output for ctid, id, other_id, tstamp:

> (690651,42) | 318698967 | 347978007 | 2015-10-20 01:55:41.757+00
> (690651,43) | 318698968 | 347978008 | 2015-10-20 01:55:41.663+00
> (690651,44) | 318698969 | 347978009 | 2015-10-20 01:55:42.005+00
> ERROR: invalid page in block 1226710 of relation base/16750/27244

> It appears 690652 is what would be dd'd if that's the route we take. Is
> that accurate?

I'm confused by the block mentioned in the error message not having
anything to do with the TID sequence. I wonder whether it refers to an
index not the table proper. What query were you using to get this output,
exactly? Have you confirmed which relation has relfilenode 27244?

> Because the message indicates the corruption is in the table's page, not
> the page header, according to the docs zero_damaged_pages probably won't
> work.

I should think that zero_damaged_pages would work fine, if the problem
is indeed in the base table. But if it's in an index, a REINDEX would be
a better answer.

Another thing to keep in mind here, if you've got replication slaves,
is that I'm not sure whether the effects of zero_damaged_pages would
propagate to slaves. Have you investigated the possibility that some
slave has an uncorrupt copy that you could dd into place in the master?

> Is this the correct command if option #2 is chosen? Can it be executed
> against a running cluster?
> dd if=/dev/zero of=database/16750/27244 bs=8192 seek=690652 count=1
> conv=notrunc

Uh, no, you're not accounting for the fact that such an offset wouldn't be
in the first segment file of the relation.

regards, tom lane

In response to

ERROR: invalid page in block 1226710 of relation base/16750/27244 at 2015-10-21 18:16:41 from bricklen

Responses

Re: ERROR: invalid page in block 1226710 of relation base/16750/27244 at 2015-10-21 19:31:13 from bricklen

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2015-10-21 18:50:39	Re: temporary indexes?
Previous Message	Jonathan Vanasco	2015-10-21 18:43:56	temporary indexes?