Recursive use of syscaches (was: relation ### modified while in use)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Recursive use of syscaches (was: relation ### modified while in use)
Date: 2000-11-09 18:51:17
Message-ID: 15452.973795877@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
>> Does this occur after a prior error message? I have been suspicious
>> because there isn't a mechanism to clear the syscache-busy flags during
>> xact abort.

> I don't know if I've seen the cases you pointed out.
> I have the following gdb back trace. Obviously it calls
> SearchSysCache() for cacheId 10 twice. I was able
> to get another gdb back trace but discarded it by
> mistake. Though I've added pause() just after detecting
> recursive use of cache,backends continue the execution
> in most cases unfortunately.
> I've not examined the backtrace yet. But don't we have
> to nail system relation descriptors more than now ?

I don't think that's the solution; nailing more descriptors than we
absolutely must is not a pretty approach, and I don't think it solves
this problem anyway. Your example demonstrates that recursive use
of a syscache is perfectly possible when a cache inval message arrives
just as we are about to search for a syscache entry. Consider
the following path:

1. We are doing index_open and ensuing relcache entry load for some user
index. In the middle of this, we need to fetch a not-currently-cached
pg_amop entry that is referenced by the index.

2. As we open pg_amop, we receive an SI message for some other user
index that is referenced in the current query and so currently has
positive refcnt. We therefore attempt to rebuild that index's relcache
entry.

3. At this point we have recursive invocation of relcache load, which
may well lead to a recursive attempt to fetch the very same pg_amop
entry that the outer relcache load is trying to fetch.

Therefore, the current error test of checking for re-entrant lookups in
the same syscache is bogus. It would still be bogus even if we refined
it to notice whether the exact same entry is being sought.

On top of that, we have the issue I was concerned about that there is
no mechanism for clearing the cache-busy flags during xact abort.

Rather than trying to fix this stuff, I propose that we simply remove
the test for recursive use of a syscache. AFAICS it will never catch
any real bugs in production. It might catch bugs in development (ie,
someone messes up the startup sequence in a way that causes a truly
circular cache lookup) but I think a stack overflow crash is a
perfectly OK result then.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-11-09 18:57:24 Re: initdb failure
Previous Message Kevin O'Gorman 2000-11-09 18:44:51 initdb failure