Loss of table structure on 7.3.19

From: Nigel Metheringham <nigel(dot)metheringham(at)dev(dot)intechnology(dot)co(dot)uk>
To: pgsql-admin(at)postgresql(dot)org
Subject: Loss of table structure on 7.3.19
Date: 2009-09-17 12:42:19
Message-ID: BD3938DF-A126-45DA-BCE0-E5D226F092D6@dev.intechnology.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

I have a database thats been running in production use since 2006 on a
Centos 4.7 (originally an earlier 4 release, updated incrementally).
The pg version is somewhat ancient as we have stuck with the system
postgres - currently postgresql-7.4.19-1.el4_6.1.

Yesterday it all fell apart with all queries/updates into it having
issues. A check showed that many of the tables had lost their
definitions - for example the task_log table now consisted on a single
timestamp field rather than the selection of fields that would
normally be there.

Fortunately we have a recent backup and could restore the db with the
loss of only a couple of hours of data.

So now the post-mortum.

Suspiciously, in the pg_clog directory there were 2049 files (0000 to
0800) - the time the failure happened coincided with the modification
time on file 07FF. I see in the documentation (http://www.postgresql.org/docs/7.3/static/routine-vacuuming.html
) warnings regarding XID rollover at 1 billion - since it appears clog
datastructures are 2 bits per XID then 2048 files of 256KB would be
1,048,576 which is close enough to 1 billion for me, and the effects
are close to those described....

However we do have a regular vacuuming process - every day each table
is VACUUM ANALYZE-ed (as well as an index rebuild).

I've experimented this morning, and nothing I can do with the VACUUM
command (FULL, against a table or the whole db etc) appears to make
the number of CLOG files reduce. If I understand their function
correctly there is no reason for them to hang around and multiply
unless there are long-running open transactions (which there are not)
- and surely if these were just open transaction related then a
database shutdown/restart ought to scrub them.

So the questions are:-
- Is this plethora of (active) CLOG files likely to be reason for
the problem?
- Or am I barking up entirely the wrong tree?
- How can I avoid this in the future?
- Is there any other information I should be digging out (I don't
have much
in the way of postgres logs)?

I've also got a Centos 5.x box with a similar installation but
postgres version 8.1.11 on it. This also appears to have an
increasing number of pg_clog files, and a full vacuum is not stopping
this...

Cheers
Nigel.

--
[ Nigel Metheringham Nigel(dot)Metheringham(at)InTechnology(dot)com ]
[ - Comments in this message are my own and not ITO opinion/policy - ]

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andre Rothe 2009-09-17 13:36:46 Wrong column default values
Previous Message brahma tiwari 2009-09-17 10:04:43 Re: PostgreSQL 8.3