Quick Links

Re: corrupted tuple (header?), pg_filedump output

From:	Eric Parusel <lists(at)globalrelay(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: corrupted tuple (header?), pg_filedump output
Date:	2005-03-19 02:16:45
Message-ID:	423B8B8D.30106@globalrelay.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I've brought this back on-list, probably best that way..?

Eric Parusel wrote:
> Tom Lane wrote:
>
>> What it kinda looks like from here is that you suffered a "page tear":
>> the itemid pointers at the front of the page may be self-consistent, but
>> they don't quite match the state of the rest of the page. For instance
>> the claimed item-2 header is obviously bogus but it looks like there is
>> a valid header starting a few bytes after where the itemid points.
>> I suspect that the itemid pointers are one generation earlier or later
>> than the remainder of the page. Since disks typically write in 512-byte
>> sectors and there is nothing else in the first 512 bytes except the
>> itemids, we could imagine that that sector got written and then the rest
>> of the page did not. Postgres is supposed to protect against this sort
>> of thing in case of a system crash, but I wouldn't want to swear that
>> the protections are completely bulletproof. Have you had any power
>> failures or system crashes lately? What sort of hardware and OS is this
>> on?
>
>
> Hmm...
> Here is some system information:
>
> Dell PE1750, 2GB ECC ram, 2x73GB 10K scsi attached to Perc4/di
> (raid-on-motherboard, LSI megaraid chipset, battery-backed cache,
> write-back cache enabled), firmware/drivers is up to date as of a month
> ago.
>
> The OS is RHEL3, kept up to date with the newest kernel for it.
>
> PgSQL 8.0.1 installed from RPMs on postgresql.org, it had 8.0.0
> installed from DGPG RPMs initially until 8.0.1 came out.
>
> No power failures or crashes since it's been up...
>
> It's been up and running with moderate to heavy load for about 2 months
> now.
>
> I don't think there have been any pgsql backend (if that's the word for
> them) processes crashing or anything of that sort...
>
> Pretty heavy write load on the box, it will be getting a 14 disk raid10
> array plugged into it soon to speed things up.
>
>
>
> I can't remember and I couldn't find it, but is there a consistency
> checking tool (pg_fsck or something?) for pgsql? Or I suppose a dump of
> the whole database (which I do nightly) ensures all the data is readable...
>
> If there's anything else I can do to help figure this out, let me know..
>
> Thanks,
> Eric
>

How would I go about double checking I don't have this problem on other
pages? As above, a successful db dump would verify everything's fine?
I suppose a dump and reload after that point would verify that my
indexes and anything else in base/ is fine?

How would I figure out where and how much to overwrite with dd if I was
to clear this page? Or how would I set the invalid item's itemid to empty?

Obviously, stuff like this tends not to be in the documentation :D

Thanks for the help,
Eric

In response to

Re: corrupted tuple (header?), pg_filedump output at 2005-03-18 01:26:10 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2005-03-19 04:13:01	Re: read-only planner input
Previous Message	Neil Conway	2005-03-18 23:15:12	Re: read-only planner input