Quick Links

Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date:	2026-06-26 20:30:23
Message-ID:	CAD21AoB6mn0pJ9fynY82M54Rxjj7hju4cNfSv+hx5Up4nK0XNg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jun 23, 2026 at 3:40 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Jun 16, 2026 at 6:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > However, I would like to optimize this part of the code by 1/ gating
> > > it with InvalidOid checkup, 2/ moving the schemaid fetch closer to
> > > where it's being used i.e. inside if (!pub->alltables). I would like
> > > to do this only for HEAD, so, I attached it as a separate 0002 patch.
> > > For pre-HEAD versions, I removed this check and retained the
> > > try_table_open() fix.
> >
> > I'm not sure skipping get_rel_namespace() contributes to a visible
> > performance gain. Rather I'm concerned that checking
> > SearchSysCacheExists2(PUBLICATIONNAMESPACEMAP) with
> > schemaid=InvalidOid could lead to unexpected results.
>
> Thanks for reviewing!
>
> AFAICS it won't give unexpected results. The first time it looks up a
> publication with an invalid schema OID, it would cause a cache miss
> and leave a negative cache entry - but that's per publication, not per
> relation, so negligible.
>
> That said, with the approach of using try_table_open for all tables
> regardless and getting the schema from the open relation via
> RelationGetNamespace, schemaid is always valid by the time it reaches
> the syscache lookup (might be worth adding an assertion here). So the
> get_rel_namespace optimization (0002) can be dropped entirely.
>
> > I'm concerned about the fact that the patch handles only the case
> > where the table doesn't have the column list. The tables in the result
> > not having the column list are guaranteed to be present since we do
> > the existence check for them but others are not. I guess it's reliable
> > if we could call try_table_open() for all tables in the table_infos,
> > and check its namespace using RelationGetNamespace(). I think it
> > doesn't lead to noticeable overheads in practice as we're already
> > calling table_open() for all tables in ALL TABLES publications or ALL
> > TABLE IN SCHEMA publications where users cannot specify the column
> > list. What do you think?
>
> My earlier argument was that no function returning table OIDs can
> guarantee they remain valid - a drop can happen right after we return
> the OID, and accuracy is in the caller's hands. All the callers of
> pg_get_publication_tables already handle this by JOINing with
> pg_class.
>
> However, a closer look at other functions that either build a list of
> table OIDs (expand_partitioned_rtentry) or work on previously built
> table OIDs (vacuum_open_relation) proves me wrong - they all account
> for concurrent table drops with try_table_open. So, I'm convinced to
> add try_table_open in pg_get_publication_tables for all the tables
> regardless, unless I'm missing something here.

>
> > > > If we use tuplestore instead of SRF, can we simplify the code as we
> > > > would not need publication_tables_state and the above check? It would
> > > > be only for the master, though.
> > >
> > > I implemented this idea for HEAD and it simplifies the code a bit.
> >
> > Sorry, I should have mentioned that the changes to use tuplestore
> > instead of SRF should be PG20. I think that we should not such a
> > refactoring even PG19.
>
> I will drop the tuplestore changes for now and repost them as a
> refactoring patch after the PG20 dev branch is cut.
>
> Thoughts?

+1. We can revisit the tuplestore idea for PG20, possibly in a
separate thread. For bug fixing, let's focus on making it simple and
less invasive.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE at 2026-06-23 22:40:12 from Bharath Rupireddy

Browse pgsql-hackers by date

	From	Date	Subject
Previous Message	Zsolt Parragi	2026-06-26 20:19:12	Re: More jsonpath methods: translate, split, join