Quick Links

Re: pg_clog woes with 7.3.2 - Episode 2

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kevin Brown <kevin(at)sysexperts(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pg_clog woes with 7.3.2 - Episode 2
Date:	2003-04-17 03:39:31
Message-ID:	3008.1050550771@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Kevin Brown <kevin(at)sysexperts(dot)com> writes:
> If badblocks shows errors but you don't see any SCSI errors in the
> system logs, then it's time to start suspecting the disk controller or
> perhaps even the PCI bus controller, because it means something really
> weird is happening on the backend that is entirely invisible. Cabling
> or termination could be an issue, but I'd expect to see parity errors,
> timed out commands, etc. if that's the problem.

Dave neglected to mention that the two or three bad blocks we'd traced
down all showed a consistent pattern of errors: there was a 64-byte
region of wrong data, aligned on a 64-byte offset from the start of the
disk block, and the contents were copies of correct data from positions
exactly 64 bytes before or after the bad area.

Considering that, I would bet a good deal that the problem is some kind
of transfer timing error in some chunk of hardware that copies the data
64 bytes at a time. I withdraw my previous thought that it might be
cabling --- there are no 64-byte-wide SCSI cables. It could easy be
internal to the SCSI adaptor though. If his motherboard is high-end
enough that the DMA path from adaptor to memory is 64 bytes wide, then
DMA timing errors would be a possibility too.

regards, tom lane

In response to

Re: pg_clog woes with 7.3.2 - Episode 2 at 2003-04-17 02:35:25 from Kevin Brown

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kevin Brown	2003-04-17 06:53:22	Re: GLOBAL vs LOCAL temp tables
Previous Message	Alvaro Herrera	2003-04-17 03:18:42	Re: GLOBAL vs LOCAL temp tables