Re: Bug #843: pg_clog files problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: charlie(at)mail(dot)vega(dot)bg, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug #843: pg_clog files problem
Date: 2002-12-10 15:27:55
Message-ID: 25929.1039534075@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

pgsql-bugs(at)postgresql(dot)org writes:
> I am running PostgreSQL 7.2.1 distributed with cygwin on
> Windows NT4 SP6.

7.2.1 has known bugs; why are you not testing 7.2.3?

> The data directory is on NTFS drive.

Not sure that we can do much about an inherently unreliable platform
:-(. I do not know whether cygwin (or even bare NT) provides any
write-ordering guarantees --- anyone able to say anything authoritative
about the behavior of fsync on this platform? In any case, there is *no
one* who will claim that cygwin/NT offers the sort of production-grade
stability that you are evidently looking for. Forget that OS and get
yourself a recent Linux release.

But having said all that, I suspect that your real issue is not a
software problem at all, but hardware:

> 2002-12-09 15:43:16 [180] FATAL 2: open of /cygdrive/d/TEMP/data/pg_clog/0677 failed: No such file or directory

We've seen several similar reports of attempted clog access far past the
actual end of clog (ie, an attempt to determine the commit status of a
garbage transaction number) and in every case where it was possible to
trace the cause, the cause was corrupted data pages on-disk. A totally
trashed disk page will often show this failure before any other evidence
appears of the data corruption, just because checking the transaction
numbers in tuple headers is one of the first steps in trying to use
data.

So I rather suspect that the real issue is your disk drive doesn't
behave well when it loses power while writing. I can't prove it at a
distance, but that's what I'd be looking into if I were you.

If you want to try to learn more from the evidence you have, I'd suggest
trying to identify the trashed data pages to see if there's any pattern
to them. A useful low-level tool for this is pg_filedump from
http://sources.redhat.com/rhdb/tools.html.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message jan 2002-12-10 16:39:00 upgrade to v7.3 and BLOBs
Previous Message Nikolay Hristov 2002-12-10 13:05:03 Bug #843: pg_clog files problem - clarification