Re: Recovering from detoast-related catcache invalidations

From: Xiaoran Wang <fanfuxiaoran(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Recovering from detoast-related catcache invalidations
Date: 2024-01-11 17:52:21
Message-ID: CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

>> BTW, while nosing around I found what seems like a very nasty related
>> bug. Suppose that a catalog tuple being loaded into syscache contains
>> some toasted fields. CatalogCacheCreateEntry will flatten the tuple,
>> involving fetches from toast tables that will certainly cause
>> AcceptInvalidationMessages calls. What if one of those should have
>> invalidated this tuple? We will not notice, because it's not in
>> the hashtable yet. When we do add it, we will mark it not-dead,
>> meaning that the stale entry looks fine and could persist for a long
>> while.
I spent some time trying to understand the bug and finally, I can reproduce
it locally with the following steps:

step1:
create a function called 'test' with a long body that must be stored in a
toast table.
and put it in schema 'yy' by : "alter function test set schema yy";

step 2:
I added a breakpoint at 'toast_flatten_tuple' for session1 ,
then execute the following SQL:
----------
set search_path='public';
alter function test set schema xx;
----------
step 3:
when the session1 stops at the breakpoint, I open session2 and execute
-----------
set search_path = 'yy';
alter function test set schema public;
-----------
step4:
resume the session1 , it reports the error "ERROR: could not find a
function named "test""

step 5:
continue to execute "alter function test set schema xx;" in session1, but
it still can not work and report the above error although the function test
already belongs to schema 'public'

Obviously, in session 1, the "test" proc tuple in the cache is outdated.

>> The detection of "get an invalidation" could be refined: what I did
>> here is to check for any advance of SharedInvalidMessageCounter,
>> which clearly will have a significant number of false positives.
>> However, the only way I see to make that a lot better is to
>> temporarily create a placeholder catcache entry (probably a negative
>> one) with the same keys, and then see if it got marked dead.
>> This seems a little expensive, plus I'm afraid that it'd be actively
>> wrong in the recursive-lookup cases that the existing comment in
>> SearchCatCacheMiss is talking about (that is, the catcache entry
>> might mislead any recursive lookup that happens).

I have reviewed your patch, and it looks good. But instead of checking for
any advance of SharedInvalidMessageCounter ( if the invalidate message is
not related to the current tuple, it is a little expensive) I have another
idea: we can recheck the visibility of the tuple with CatalogSnapshot(the
CatalogSnapthot must be refreshed if there is any SharedInvalidMessages) if
it is not visible, we re-fetch the tuple, otherwise, we can continue to use
it as it is not outdated.

I added a commit based on your patch and attached it.

Attachment Content-Type Size
0001-Recheck-the-tuple-visibility.patch application/octet-stream 1.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-11 17:56:36 pgsql: Add new pg_walsummary tool.
Previous Message Robert Haas 2024-01-11 17:37:58 Re: Stack overflow issue