Re: Database corruption help

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Lister <john(dot)lister-ps(at)kickstone(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Database corruption help
Date: 2009-02-13 16:19:00
Message-ID: 3161.1234541940@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

John Lister <john(dot)lister-ps(at)kickstone(dot)com> writes:
> Originally in psql-admin, but copied here at the request of Tom to..

Thanks for forwarding this. The reason I wanted to call it to the
attention of pgsql-hackers is that the page contents seem a bit odd,
and I'm not sure that we should just write it off as "pilot error".
What we've got here is a page full of transient states of the pg_class
row for temp_queue, which as you explained is constantly being
re-TRUNCATEd by your application. So the data contents of each state
of the row vary only in relfilenode, as expected. One would also expect
that all the copies on a particular page of pg_class would form a single
HOT chain (the database is 8.3.5). What we've actually got is a chain
that is broken in two places and lacks a root tuple(!).

How could it have got that way? John's ill-advised removal of the
transaction logs should have resulted only in the page being a lot older
than it should be, not in a logically corrupt page.

The only other corruption mechanism I can think of is that pg_clog might
contain commit bits for some logically inconsistent set of transaction
numbers, due to some pages of pg_clog having made it to disk and others
not. That could result in some of the intermediate tuples in the chain
not being seen as dead --- but that's not what we see here either.

Aside from the "how did this happen" puzzle, the real point of any
investigation of course ought to be whether we can make heap_page_prune
more robust. At the very least it's undesirable to be leaving the page
in a state where VACUUM FULL will decide it can't shrink.

Ideas anyone?

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2009-02-13 16:28:28 Re: Database corruption help
Previous Message John Lister 2009-02-13 09:17:39 Re: Database corruption help

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2009-02-13 16:24:46 Re: Updates of SE-PostgreSQL 8.4devel patches (r1530)
Previous Message Tom Lane 2009-02-13 15:47:44 Re: composite types DROP..CASCADE behaviour - bug or intentional?