From: | Eric Parusel <lists(at)globalrelay(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: corrupted tuple (header?), pg_filedump output |
Date: | 2005-03-19 02:16:45 |
Message-ID: | 423B8B8D.30106@globalrelay.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've brought this back on-list, probably best that way..?
Eric Parusel wrote:
> Tom Lane wrote:
>
>> What it kinda looks like from here is that you suffered a "page tear":
>> the itemid pointers at the front of the page may be self-consistent, but
>> they don't quite match the state of the rest of the page. For instance
>> the claimed item-2 header is obviously bogus but it looks like there is
>> a valid header starting a few bytes after where the itemid points.
>> I suspect that the itemid pointers are one generation earlier or later
>> than the remainder of the page. Since disks typically write in 512-byte
>> sectors and there is nothing else in the first 512 bytes except the
>> itemids, we could imagine that that sector got written and then the rest
>> of the page did not. Postgres is supposed to protect against this sort
>> of thing in case of a system crash, but I wouldn't want to swear that
>> the protections are completely bulletproof. Have you had any power
>> failures or system crashes lately? What sort of hardware and OS is this
>> on?
>
>
> Hmm...
> Here is some system information:
>
> Dell PE1750, 2GB ECC ram, 2x73GB 10K scsi attached to Perc4/di
> (raid-on-motherboard, LSI megaraid chipset, battery-backed cache,
> write-back cache enabled), firmware/drivers is up to date as of a month
> ago.
>
> The OS is RHEL3, kept up to date with the newest kernel for it.
>
> PgSQL 8.0.1 installed from RPMs on postgresql.org, it had 8.0.0
> installed from DGPG RPMs initially until 8.0.1 came out.
>
> No power failures or crashes since it's been up...
>
> It's been up and running with moderate to heavy load for about 2 months
> now.
>
> I don't think there have been any pgsql backend (if that's the word for
> them) processes crashing or anything of that sort...
>
> Pretty heavy write load on the box, it will be getting a 14 disk raid10
> array plugged into it soon to speed things up.
>
>
>
> I can't remember and I couldn't find it, but is there a consistency
> checking tool (pg_fsck or something?) for pgsql? Or I suppose a dump of
> the whole database (which I do nightly) ensures all the data is readable...
>
> If there's anything else I can do to help figure this out, let me know..
>
> Thanks,
> Eric
>
How would I go about double checking I don't have this problem on other
pages? As above, a successful db dump would verify everything's fine?
I suppose a dump and reload after that point would verify that my
indexes and anything else in base/ is fine?
How would I figure out where and how much to overwrite with dd if I was
to clear this page? Or how would I set the invalid item's itemid to empty?
Obviously, stuff like this tends not to be in the documentation :D
Thanks for the help,
Eric
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-03-19 04:13:01 | Re: read-only planner input |
Previous Message | Neil Conway | 2005-03-18 23:15:12 | Re: read-only planner input |