Re: error: could not find pg_class tuple for index 2662

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: error: could not find pg_class tuple for index 2662
Date: 2011-08-01 17:23:49
Message-ID: 27557.1312219429@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

daveg <daveg(at)sonic(dot)net> writes:
> On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
>> I think we need to start adding some instrumentation so we can get a
>> better handle on what's going on in your database. If I were to send
>> you a source-code patch for the server that adds some more logging
>> printout when this happens, would you be willing/able to run a patched
>> build on your machine?

> Yes we can run an instrumented server so long as the instrumentation does
> not interfere with normal operation. However, scheduling downtime to switch
> binaries is difficult, and generally needs to be happen on a weekend, but
> sometimes can be expedited. I'll look into that.

OK, attached is a patch against 9.0 branch that will re-scan pg_class
after a failure of this sort occurs, and log what it sees in the tuple
header fields for each tuple for the target index. This should give us
some useful information. It might be worthwhile for you to also log the
results of

select relname,pg_relation_filenode(oid) from pg_class
where relname like 'pg_class%';

in your script that does VACUUM FULL, just before and after each time it
vacuums pg_class. That will help in interpreting the relfilenodes in
the log output.

> My observations so far are:

> - the error occurs at commit of vacuum full of pg_class
> - in these cases error hits autovacuum after it waited for a lock on pg_class
> - in these two cases there was a new process startup while the vacuum was
> running. Don't know if this is relevant.

Interesting. We'll want to know whether that happens every time.

> - while these hit autovacuum, the error does hit other processs (just not in
> these sessions). Unknown if autovacuum is a required component.

Good question. Please consider setting log_autovacuum_min_duration = 0
so that the log also traces all autovacuum activity.

regards, tom lane

Attachment Content-Type Size
log_missing_indexes-1.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-01 17:31:22 Re: Compressing the AFTER TRIGGER queue
Previous Message Simon Riggs 2011-08-01 17:08:23 Re: One-Shot Plans