Re: zombie primary key lurches out of database to devour the brains of the unwary

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Murphy <murphy(at)genome(dot)chop(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: zombie primary key lurches out of database to devour the brains of the unwary
Date: 2004-09-02 14:11:32
Message-ID: 19417.1094134292@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kevin Murphy <murphy(at)genome(dot)chop(dot)edu> writes:
> On Aug 31, 2004, at 4:41 PM, Tom Lane wrote:
>> That's strange. Maybe some sort of catalog corruption? Try "psql -E"
>> to see the queries issued by \d, and then execute them by hand to see
>> what you get.

> egenome_test=# SELECT i.indisunique, i.indisprimary, a.amname,
> c2.relname,
> pg_catalog.pg_get_expr(i.indpred, i.indrelid)
> FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class
> c2, pg_catalog.pg_am a
> WHERE i.indexrelid = c.oid AND c.oid = '112337075' AND c.relam = a.oid
> AND i.indrelid = c2.oid
> ;
> egenome_test-# egenome_test-# egenome_test-# egenome_test-#
> egenome_test-# indisunique | indisprimary | amname | relname |
> pg_get_expr
> -------------+--------------+--------+---------+-------------
> (0 rows)

Looking at psql's describe code, it will indeed abort and print nothing
at all when this query returns zero rows. So the question now reduces
to why this didn't find anything. The theories that come to mind are
(a) one of the relevant catalog rows is in fact missing, or (b) one of
the indexes that are used is corrupted so that the needed row isn't
found. From the prior queries we already know the index's pg_class and
pg_index rows are there, and it's highly unlikely that pg_am would be
damaged since it's essentially read-only, so that means the index's
parent table's pg_class row is deleted or unfindable.

Corruption of pg_class's OID index is certainly not impossible, but I'd
think it'd lead to worse problems than this. My bet is on an incomplete
DROP operation --- ie, the parent table was dropped but for some reason
its child index wasn't. Corruption of the pg_depend link from the one
to the other could have done it. If you've had any database crashes
lately, then it's possible that the ultimate culprit is the transaction-
vs-checkpoint sync bug we identified a couple weeks ago. That can lead
to whole or partial loss of committed transactions :-(

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Boes 2004-09-02 14:14:58 Re: Possible to insert quoted null value into integer field?
Previous Message Tom Lane 2004-09-02 13:45:11 Re: plpgsql function