Re: BUG #3484: Missing pg_clog file / corrupt index

From: "Feng Chen" <fchen(at)covergence(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Marc Schablewski" <ms(at)clickware(dot)de>
Cc: "Decibel!" <decibel(at)decibel(dot)org>, <pgsql-bugs(at)postgresql(dot)org>, "Feng Chen" <fchen(at)covergence(dot)com>
Subject: Re: BUG #3484: Missing pg_clog file / corrupt index
Date: 2007-08-24 13:19:49
Message-ID: 0D1719326D64BD4E9F92A0C120237678029B8904@eserv.covergence.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We also have the same exact problem - every 5 to 10 days when the data
get to some size, PostgreSQL complains about missing pg_clog files, and
invalid page headers during either vacuum or reindex operations.

The problem happens on different customer sites with Linux 2.6.11.

There is one particular table that is much more heavily used - 99%
inserts, some selects. And this table turns to be the one having
problems.

If this happens at different companies so frequently, I'd doubt it's a
hardware problem. And we did check the hardware but did not find any
problems.

Here are some sample messages:

2007-07-07T10:11:38+00:00 ERROR: could not access status of transaction
842085945
2007-07-07T10:11:38+00:00 DETAIL: could not open file "pg_clog/0323":
No such file or directory

2007-08-07T08:10:23+00:00 ERROR: could not access status of transaction
1481866610
2007-08-07T08:10:24+00:00 DETAIL: could not open file "pg_clog/0585":
No such file or directory
2007-08-07T08:13:55+00:00 ERROR: invalid page header in block 346965 of
relation "sipmessage"
2007-08-07T08:30:16+00:00 ERROR: could not access status of transaction
1481866610
2007-08-07T08:30:16+00:00 DETAIL: could not open file "pg_clog/0585":
No such file or directory
2007-08-07T08:34:08+00:00 ERROR: invalid page header in block 346965 of
relation "sipmessage"
2007-08-07T08:51:02+00:00 ERROR: could not access status of transaction
1481866610
2007-08-07T08:51:02+00:00 DETAIL: could not open file "pg_clog/0585":
No such file or directory

2007-08-13T10:12:07+00:00 ERROR: invalid page header in block 4018 of
relation "calllegstart_sessionid"
2007-08-13T10:12:15+00:00 ERROR: could not access status of
transaction 0
2007-08-13T10:12:15+00:00 DETAIL: could not create file
"pg_subtrans/0201": File exists

-----Original Message-----
From: pgsql-bugs-owner(at)postgresql(dot)org
[mailto:pgsql-bugs-owner(at)postgresql(dot)org] On Behalf Of Gregory Stark
Sent: Tuesday, July 31, 2007 7:00 AM
To: Marc Schablewski
Cc: Decibel!; pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #3484: Missing pg_clog file / corrupt index

"Marc Schablewski" <ms(at)clickware(dot)de> writes:

> I kept a copy of the data files in case it is needed, but I have to
> check first, if I am allowed to give away that information. Some of
the
> data is confidential. If you just need the files containing the
dammaged
> table, this won't be a big problem, because it does not contain any
> confidential information (as long as one data file only contains the
> data of one table). The other problem is the size of the files. The
> whole database is about 60GB and the files belonging to that table are
> about 2.5GB. Mayby there is a way to pre-select the data you need?

Perhaps. You could find the records with unreasonable values. But I
don't
think there's any convenient way to find the records which produce the
clog
error or which are missing unless they turn out to be on the same page.

Out of curiosity, what do the unreasonable values look like?

Earlier you said:

> We narrowed it down to a few records in that table. Some records
contain
> unreasonable values, others produce the same message about the missing
> pg_clog file when selected and some are simply missing. But they must
have
> existed, because there are still records in a second table referencing
them.

If you still have a live database with this data then if you can do

SELECT ctid FROM tab WHERE ...

for the records with unreasonable values that might tell you what blocks
are
corrupted. The value before the comma is the block number, which when
multiplied by 8192 (assuming you're using 8k blocks) will tell you what
file
offset to look for the page.

To find the file to look for the block in do:

postgres=# select relfilenode from pg_class where relname = 'tablename';
relfilenode
-------------
16384
(1 row)

Note that if the file offset is over 1G then you would be looking for a
file
named 16384.N where N is which gigabyte chunk.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Sullivan 2007-08-24 13:54:01 Re: BUG #3484: Missing pg_clog file / corrupt index
Previous Message Tom Lane 2007-08-23 16:17:23 Re: BUG #3571: call to decrypt causes segfault