Skip site navigation (1) Skip section navigation (2)

Re: error: could not find pg_class tuple for index 2662

From: daveg <daveg(at)sonic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: error: could not find pg_class tuple for index 2662
Date: 2011-08-03 11:57:31
Message-ID: 20110803115731.GA14353@sonic.net (view raw or flat)
Thread:
Lists: pgsql-hackers
On Mon, Aug 01, 2011 at 01:23:49PM -0400, Tom Lane wrote:
> daveg <daveg(at)sonic(dot)net> writes:
> > On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
> >> I think we need to start adding some instrumentation so we can get a
> >> better handle on what's going on in your database.  If I were to send
> >> you a source-code patch for the server that adds some more logging
> >> printout when this happens, would you be willing/able to run a patched
> >> build on your machine?
> 
> > Yes we can run an instrumented server so long as the instrumentation does
> > not interfere with normal operation. However, scheduling downtime to switch
> > binaries is difficult, and generally needs to be happen on a weekend, but
> > sometimes can be expedited. I'll look into that.
> 
> OK, attached is a patch against 9.0 branch that will re-scan pg_class
> after a failure of this sort occurs, and log what it sees in the tuple
> header fields for each tuple for the target index.  This should give us
> some useful information.  It might be worthwhile for you to also log the
> results of
> 
> select relname,pg_relation_filenode(oid) from pg_class
> where relname like 'pg_class%';
> 
> in your script that does VACUUM FULL, just before and after each time it
> vacuums pg_class.  That will help in interpreting the relfilenodes in
> the log output.

We have installed the patch and have encountered the error as usual.
However there is no additional output from the patch. I'm speculating
that the pg_class scan in ScanPgRelationDetailed() fails to return
tuples somehow.


I have also been trying to trace it further by reading the code, but have not
got any solid hypothesis yet. In the absence of any debugging output I've
been trying to deduce the call tree leading to the original failure. So far
it looks like this:

RelationReloadIndexInfo(Relation)
    // Relation is 2662 and !rd_isvalid
    pg_class_tuple = ScanPgRelation(2662, indexOK=false)  // returns NULL
        pg_class_desc = heap_open(1259, ACC_SHARE)
            r = relation_open(1259, ACC_SHARE) // locks oid, ensures RelationIsValid(r)
                r = RelationIdGetRelation(1259)
                    r = RelationIdCacheLookup(1259)   // assume success
                    if !rd_isvalid:
                        RelationClearRelation(r, true)
                            RelationInitPhysicalAddr(r) // r is pg_class relcache

-dg

-- 
David Gould       daveg(at)sonic(dot)net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

In response to

Responses

pgsql-hackers by date

Next:From: Dimitri FontaineDate: 2011-08-03 12:38:56
Subject: Re: Transient plans versus the SPI API
Previous:From: Peter GeogheganDate: 2011-08-03 11:44:40
Subject: Re: Further news on Clang - spurious warnings

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group