Re: page corruption bug

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "A Palmblad" <adampalmblad(at)yahoo(dot)ca>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: page corruption bug
Date: 2004-04-12 21:45:10
Message-ID: 14110.1081806310@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

"A Palmblad" <adampalmblad(at)yahoo(dot)ca> writes:
> We are having a recurring problem with page corruption in our
> database.

The symptoms you describe are indistinguishable from those seen with
flaky hardware. I'd strongly suggest doing more extensive testing of
both RAM and disks. memtest86 and badblocks are the least common
denominator for test programs, though I think you can get better ones
if you're willing to pay. (In particular, I do not know if memtest86
can reach all of RAM in a 64-bit machine; it may be 32-bit-only...)

The software setup (dual AMD's and a 64-bit compile) is a bit off the
beaten track, but if you did have a porting problem these are not the
sort of symptoms I'd expect. My money is on a hardware fault.

I'll even go out on a limb and suggest that it's probably bad RAM rather
than drives; the behavior seems consistent with flaky RAM in an address
range that doesn't get used until the kernel has managed to fill up most
of memory.

> Another error was just noted, reading as follows: ERROR: Couldn't open segm=
> ent 1 of relation: XXXX (target block 746874992): No such file or directory.

Likely explanation is a trashed block pointer in an index entry. Again,
not too surprising if hardware is flaky.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2004-04-12 21:53:04 Re: Core Dump on SunOS + 7.3.3
Previous Message Josh Berkus 2004-04-12 21:14:15 Re: Core Dump on SunOS + 7.3.3