Re: [GENERAL] cache lookup of relation 165058647 failed

From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Bugs List <pgsql-bugs(at)postgresql(dot)org>, Juris Krumins <juriskr(at)komin(dot)lv>
Subject: Re: [GENERAL] cache lookup of relation 165058647 failed
Date: 2004-05-05 20:40:54
Message-ID: 816FE1CE-9ED4-11D8-B669-000A95C705DC@chittenden.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

>>> I'v find out that this error occurs in:
>>> dependency.c file
>>>
>>> 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of
>>> relation
>>> 149064743 failed
>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist
>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist
>>>
>>> in getRelationDescription(StringInfo buffer, Oid relid) function.
>>>
>>> Any ideas what can cause this errors.
>> <aol>Me too.</aol>
>> But, I am suspecting that it's a race condition with the new
>> background writer code. I've started testing a new database design
>> and was able to reproduce this on my laptop nearly 90% of the time,
>> but could only reproduce it about 10% of the time on my production
>> databases until I figured out what the difference was, fsync.
>
> temp tables don't use the shared buffer cache, how can this be related
> to the BG writer?

Don't the system catalogs use the shared buffer cache?

BEGIN;
SELECT create_temp_table_func(); -- Inserts a row into pg_class via
CREATE TEMP TABLE
-- Do other stuff
COMMIT; -- After the commit, the row is now visible to other
backends
-- disconnect -- If the delay between the disconnect and reconnect is
small enough
-- reconnect -- It's as though there is a race condition that allows
the function
-- pg_table_is_visible() to assert the "cache lookup of relation"
-- error.
BEGIN;
SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I
call
/* SELECT TRUE FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = ''footmp''::TEXT AND
c.relkind = ''r''::TEXT AND
pg_catalog.pg_table_is_visible(c.oid); */
-- But the query fails

My guess was that the series of events went something like:

proc 0) COMMIT's and the row in pg_class is committed
proc 1) bgwriter writer code removes a page for the cache
proc 2) queries for the page [*]
proc 1) writes it to disk
proc 2) queries for the page [*]
proc 1) sync's the fd

[*] proc 2 queries for the page at either of these points

In 7.4, there is no bgwriter or background process mucking with cache,
which is why this works 100% of the time. In 7.5, however, there's a
200ms gap where a race condition appears and pg_table_is_visible()
fails its PointerIsValid() check. If I put a sleep in, the sleep gives
the bgwriter enough time to commit the pages to disk so that the
queries for the page happen after the fd's been sync()'ed.

I have no other clue as to why this would be happening though, so
believe me when I say, I could very well be quite wrong.... but this is
my best, quasi-educated/grep(1)'ed guess.

-sc

--
Sean Chittenden

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jan Wieck 2004-05-06 03:30:11 Re: [GENERAL] cache lookup of relation 165058647 failed
Previous Message Devrim GUNDUZ 2004-05-05 19:40:27 Re: Turkish locale bug

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2004-05-05 20:41:33 Re: Load Balancing and Backup
Previous Message Tom Lane 2004-05-05 20:28:29 Re: vacuumdb is failing with NUMBER OF INDEX TUPLES NOT THE SAME AS HEAP