Database corruption help

From: John Lister <john(dot)lister-ps(at)kickstone(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Database corruption help
Date: 2009-02-13 08:20:03
Message-ID: 49952D33.3000707@kickstone.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Originally in psql-admin, but copied here at the request of Tom to..

Story so far, transaction log archiving went wrong causing the
transaction log disk to fill up. Foolishly i deleted the unarchived
transaction logs (early monday morning) which required a pg_resetxlog to
get the db up and running again. Since then we've had some minor db
corruption which has been fixed (mainly duplicate primary keys) except
for the pg_class table.

If i do a vacuum full on pg_class i get something like this:

INFO: vacuuming "pg_catalog.pg_class"INFO: "pg_class": found 37
removable, 1845 nonremovable row versions in 18905 pages
DETAIL: 27 dead row versions cannot be removed yet.
Nonremovable row versions range from 160 to 229 bytes long.
There were 933834 unused item pointers.
Total free space (including removable row versions) is 150368692 bytes.
18839 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.01s/0.05u sec elapsed 0.17 sec.INFO: index "pg_class_oid_index"
now contains 1813 row versions in 7 pages
DETAIL: 56 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: index
"pg_class_relname_nsp_index" now contains 1818 row versions in 24 pages
DETAIL: 63 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.WARNING: index
"pg_class_relname_nsp_index" contains 1818 row versions, but table
contains 1813 row versions
HINT: Rebuild the index with REINDEX.INFO: analyzing
"pg_catalog.pg_class"INFO: "pg_class": scanned 3000 of 18905 pages,
containing 227 live rows and 6 dead rows; 227 rows in sample, 1430
estimated total rows
Total query runtime: 4469 ms.

As you can see there are non-removable dead rows (which slowly grows)
and the table size is also increasing in size.. A reindex on the
indexes mentions also succeeds but another vacuum reports the same thing...

In the log files the relevant bits are:

GMT LOG: relation "pg_class" TID 15538/4: dead HOT-updated tuple ---
cannot shrink relation
2009-02-12 21:06:40 GMT STATEMENT: VACUUM FULL VERBOSE ANALYZE pg_class
2009-02-12 21:06:40 GMT WARNING: index "pg_class_relname_nsp_index"
contains 1818 row versions, but table contains 1813 row versions

and finally the last message in the psql-admin thread suggested dumping
the above page, which is attached before.

>Hmm. The comments in vacuum.c about this case suppose that it could
>only be a transient condition, ie the tuple became dead while we were
>looking at the page. Evidently it's persistent for you, which means
>that for some reason heap_page_prune() is failing to remove an
>already-dead tuple. I suspect this implies a corrupt HOT chain, but
>maybe the data is okay and what you've got is really a bug in
>heap_page_prune.

>Could you send a dump of page 15538 of pg_class, using pg_filedump?
>The source code for it is here:
>http://sources.redhat.com/rhdb/utilities.html
>Best would be -i -f format, ie
> pg_filedump -i -f -R 15538 $PGDATA/base/something/1259
>
> regards, tom lane

Any help would be appreciated as the pg_class table is constantly
growing which i'm guessing is going to start to affect performance
fairly soon. I'd like to avoid a full restore from backup if possible.

Thanks

JOHN

Attachment Content-Type Size
dump.txt text/plain 18.6 KB

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message John Lister 2009-02-13 09:17:39 Re: Database corruption help
Previous Message Tom Lane 2009-02-12 22:13:03 Re: [ADMIN] database corruption help

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-02-13 08:39:27 Re: WIP: hooking parser
Previous Message Simon Riggs 2009-02-13 08:14:27 Re: Hot Standby: subxid cache changes