Re: Large Database \d: ERROR: cache lookup failed for relation ...

From: Jim Nasby <decibel(at)decibel(dot)org>
To: Jones Erik <erik(at)myemma(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)enterprisedb(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Database \d: ERROR: cache lookup failed for relation ...
Date: 2007-06-06 04:49:31
Message-ID: EE2BA3FF-5847-42A7-BD51-857778B85019@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm working with these guys to resolve the immediate issue, but I
suspect there's a race condition somewhere in the code.

What's happened is that OIDs have been changed in the system. There's
not a lot of table DDL that happens, but there is a substantial
amount of view DDL that can take place. In a nutshell, tables will
sometimes have fields added to them, and when that happens a whole
set of views needs to be re-created to take the new fields into account.

The files for corrupted tables do exist; this seems to be mostly a
catalog corruption issue. I'm seeing both what appear to be
inconsistencies between relcache and the catalog tables as well as
corruption between tables themselves:

emma2=# select * from userdata_8464_campaigns;
ERROR: could not open relation with OID 138807643
emma2=# \d userdata_8464_campaigns
Table
"public.userdata_8464_campaigns"
Column | Type
| Modifiers
-------------------------------+-----------------------------
+------------------------------------------------------------------
campaign_id | bigint | not null
default nextval(('emma_campaigns_seq'::text)::regclass)
account_id | bigint | not null
cep_object_id | bigint | not
null default nextval(('cep_object_seq'::text)::regclass)
campaign_name | character varying(255) | not null
campaign_subject | character varying(255) | not null
layout_page_id | bigint | not null
layout_content_id | bigint | not null
campaign_create_date | timestamp without time zone | not null
default now()
campaign_last_mod_date | timestamp without time zone | not null
default now()
campaign_status | character varying(50) | not null
campaign_parent_id | bigint |
published_campaign_id | bigint |
campaign_plaintext | text |
campaign_plaintext_ds | timestamp without time zone |
delivery_old_score | double precision |
campaign_person_defaults | text |
Inherits: emma_campaigns

select oid from pg_class where relname='userdata_8464_campaigns';
oid
--------
533438
(1 row)

And that file actually does exist on disk...

select * from pg_index where indexrelid=138807643;
indexrelid | indrelid | indnatts | indisunique | indisprimary |
indisclustered | indisvalid | indkey | indclass | indexprs | indpred
------------+----------+----------+-------------+--------------
+----------------+------------+--------+----------+----------+---------
138807643 | 533438 | 1 | t | t |
f | t | 1 | 1980 | |
(1 row)

select * from pg_class where oid=138807643;
relname | relnamespace | reltype | relowner | relam | relfilenode |
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid
| relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey
| relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+-------------
+---------------+----------+-----------+---------------
+---------------+-------------+-------------+---------+----------
+-----------+-------------+----------+----------+---------
+------------+------------+-------------+----------------
+--------------+--------+------------
(0 rows)

On Jun 5, 2007, at 11:27 AM, Erik Jones wrote:

> I originally sent this message from my gmail account yesterday as
> we were having issues with our work mail servers yesterday, but
> seeing that it hasn't made it to the lists yet, I'm resending from
> my registered address. You have my apologies if you receive this
> twice.
>
> "Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com>
> writes:
> > I'm dealing with a database where there are ~150,000 rows in
>
> > information_schema.tables. I just tried to do a \d, and it came back
> > with this:
>
> > ERROR: cache lookup failed for relation [oid]
>
> > Is this indicative of corruption, or is it possibly a resource
> issue?
>
> Greetings,
>
> This message is a follow-up to Thomas's message quoted above (we're
> working together on the same database). He received one response
> when he sent the above message which was from Tom Lane and can be
> easily summarized as him having said that that could happen tables
> were being created or dropped while running the \d in psql.
> Unfortunately, that wasn't the case, we have now determined that
> there is some corruption in our database and we are hoping some of
> you back-end gurus might have some suggestions.
>
> How we verified that there is corruption was simply to reindex all
> of our tables in addition to getting the same errors when running a
> dump this past weekend. We so far have a list of five tables for
> which reindex fails with the error: "ERROR: could not open relation
> with OID xxxx" (sub xxxx with the five different #s) and one that
> fails reindexing with "ERROR: xxxxx is an index" where is an index
> on a completely different table. After dropping all of the indexes
> on these tables (a couple didn't have any to begin with), we still
> cannot run reindex on them. In addition, we can't drop the tables
> either (we get the same errors). We can however run alter table
> statements on them. So, we have scheduled a downtime for an evening
> later this week wherein we plan on bringing the database down for a
> REINDEX SYSTEM and before that we are going to run a dump excluding
> those tables, restore that on a separate machine and see if these
> errors crop up there anywhere. Is there anything else anyone can
> think of that we can do to narrow down where the actual corruption
> is or how to fix it?
>
> Erik Jones
>
> Software Developer | Emma®
> erik(at)myemma(dot)com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vincenzo Romano 2007-06-06 07:00:14 Re: Running v8.1 amd v8.2 at the same time for a transition
Previous Message Joe Conway 2007-06-06 03:34:39 Re: Encrypted column