Could not open file "pg_clog/...."

From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Could not open file "pg_clog/...."
Date: 2009-05-12 10:04:58
Message-ID: 28011CD60FB1724DBA4442E38277F6260CB11927@hermes.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello!

Recently one of my PostgreSQL servers has started throwing error
messages like these:

ERROR: could not access status of transaction 3489956864
DETAIL: Could not open file "pg_clog/0D00": Datei oder Verzeichnis
nicht gefunden. (file not found)

The machine in question doesn't show any signs of a hardware defect,
we're running a RAID-10 over 10 disks for this partition on a 3Ware
hardware RAID controller with battery backup unit, the controller
doesn't show any defects at all. We're running PostgreSQL 8.3.5 on that
box, kernel is 2.6.18-6-amd64 of Debian Etch, the PostgreSQL binaries
were compiled from source on that machine.

I searched the lists and though I couldn't find an exact hint as to
what's causing this, I found a suggestion for a more or less hotfix
solution:
Create a file of the required size filled with zeroes and then put that
into the clog-directory, i.e.
dd bs=262144 count=1 if=/dev/zero of=/tmp/pg_clog_replacements/0002
chown postgres.daemon /tmp/pg_clog_replacements/0002
chmod 600 /tmp/pg_clog_replacements/0002
mv /tmp/pg_clog_replacements/0002 /var/lib/pgsql/data/pg_clog

I know that I'd be loosing some transactions, but in our use case this
is not critical. Anyway, this made the problem go away for a while but
now I'm getting those messages again - and indeed the clog-files in
question appear to be missing altogether. And what's worse, the
workaround no longer works properly but makes PostgreSQL crash:

magazine=# vacuum analyze pcaction.article;
PANIC: corrupted item pointer: 5
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

And from the logfile:

<2009-05-12 11:38:09 CEST - 6606: [local](at)magazine>PANIC: corrupted
item pointer: 5
<2009-05-12 11:38:09 CEST - 6606: [local](at)magazine>STATEMENT: vacuum
analyze pcaction.article;
<2009-05-12 11:38:09 CEST - 29178: @>LOG: server process (PID 6606) was
terminated by signal 6: Aborted
<2009-05-12 11:38:09 CEST - 29178: @>LOG: terminating any other active
server processes
<2009-05-12 11:38:09 CEST - 6607:
192.168.222.134(57292)@magazine>WARNING: terminating connection because
of crash of another server process
<2009-05-12 11:38:09 CEST - 6607:
192.168.222.134(57292)@magazine>DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit,
because another server process exited abnormally and possibly corrupted
shared memory.
<2009-05-12 11:38:09 CEST - 6569: 192.168.222.134(57214)@bluebox>HINT:
In a moment you should be able to reconnect to the database and repeat
your command.
[...]
<2009-05-12 11:38:09 CEST - 29178: @>LOG: all server processes
terminated; reinitializing
<2009-05-12 11:38:09 CEST - 6619: @>LOG: database system was
interrupted; last known up at 2009-05-12 11:37:51 CEST
<2009-05-12 11:38:09 CEST - 6619: @>LOG: database system was not
properly shut down; automatic recovery in progress
<2009-05-12 11:38:09 CEST - 6619: @>LOG: redo starts at 172/8B4EE118
<2009-05-12 11:38:09 CEST - 6619: @>LOG: record with zero length at
172/8B6AD510
<2009-05-12 11:38:09 CEST - 6619: @>LOG: redo done at 172/8B6AD4E0
<2009-05-12 11:38:09 CEST - 6619: @>LOG: last completed transaction was
at log time 2009-05-12 11:38:09.550175+02
<2009-05-12 11:38:09 CEST - 6619: @>LOG: checkpoint starting: shutdown
immediate
<2009-05-12 11:38:09 CEST - 6619: @>LOG: checkpoint complete: wrote 351
buffers (1.1%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=0.008s, sync=0.000 s, total=0.009 s
<2009-05-12 11:38:09 CEST - 6622: @>LOG: autovacuum launcher started
<2009-05-12 11:38:09 CEST - 29178: @>LOG: database system is ready to
accept connections

Now what exactly is causing those missing clog files, what can I do to
prevent this and what can I do to recover my database cluster, as this
issue seems to prevent proper dumps at the moment?

Kind regards

Markus

Jede Stimme zahlt, jetzt voten fur die besten Games: www.bamaward.de

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2009-05-12 10:14:14 Re: Putting many related fields as an array
Previous Message Joe Kramer 2009-05-12 08:57:07 Re: Unable to access table named "user"