Another nasty cache problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Another nasty cache problem
Date: 2000-01-30 15:41:13
Message-ID: 22885.949246873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm down to the point where the parallel tests mostly work with a small
SI buffer --- but they do still sometimes fail. I've realized that
there is a whole class of bugs along the following lines:

There are plenty of routines that do two or more SearchSysCacheTuple
calls to get the information they need. As the code stands, it is
unsafe to continue accessing the tuple returned by SearchSysCacheTuple
after making a second such call, because the second call could possibly
cause an SI cache reset message to be processed, thereby flushing the
contents of the caches.

heap_open and CommandCounterIncrement are other routines that could
cause cache entries to be dropped.

This is a very insidious kind of bug because the probability of
occurrence is very low (at normal SI buffer size a reset is unlikely,
and even if it happens, you won't observe a failure unless the
pfree'd tuple is actually overwritten before you're done with it).
So we cannot hope to catch these things by testing.

I am not sure what to do about it. One solution path is to make
all the potential trouble spots do SearchSysCacheTupleCopy and then
pfree the copied tuple when done. However, that adds a nontrivial
amount of overhead, and it'd be awfully easy to miss some trouble
spots or to introduce new ones in the future.

Another possibility is to introduce some sort of notion of a reference
count, and to make the standard usage pattern be
tuple = SearchSysCacheTuple(...);
... use tuple ...
ReleaseSysCacheTuple(tuple);
The idea here is that a tuple with positive refcount would not be
deleted during a cache reset, but would simply be removed from its
cache, and then finally deleted when released (or during elog
recovery).

This might allow us to get rid of SearchSysCacheTupleCopy, too,
since the refcount should be just as good as palloc'ing one's own
copy for most purposes.

I haven't looked at the callers of SearchSysCacheTuple to see whether
this would be a practical change to make. I was wondering if anyone
had any comments or better ideas...

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-01-30 17:57:58 Re: [HACKERS] Another nasty cache problem
Previous Message Adriaan Joubert 2000-01-30 14:43:18 Re: [HACKERS] Bit strings