Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS
Date: 2020-09-15 19:46:18
Message-ID: 1353598.1600199178@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> [ $subject ]

I found some time to trace this down, and what it turns out to be is
that apply_handle_truncate() is making use of a LogicalRepRelMapEntry's
localreloid field without any consideration for the possibility that
that's been set to zero as a result of a cache flush. The visible
symptom of "cache lookup failed for relation 0" comes from trying
to invoke find_all_inheritors with a zero OID.

Now, study of apply_handle_truncate doesn't immediately reveal where
a cache flush could have occurred, but I realized that it's actually
possible that the LogicalRepRelMapEntry is *already* marked invalid
when logicalrep_rel_open() returns! That's because for some reason
it does GetSubscriptionRelState last, after it's already marked the
entry valid, and that function does plenty o' catalog accesses.

It's not really clear to me why setting localreloid to zero is a sane
way to represent "this entry needs to be revalidated". I think a
separate flag would be more appropriate. Once we have lock on the
target relation, it seems to me that no interesting changes should
be possible as long as we have lock; so there's no very good reason
to destroy useful state to remind ourselves that we should recheck
it next time.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-15 19:54:56 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Robert Haas 2020-09-15 19:36:22 Re: recovering from "found xmin ... from before relfrozenxid ..."