Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date: 2026-04-24 05:49:40
Message-ID: CAJpy0uA8m+URZivj_SK9VdhQgW2sqJdEVCb9c_n34BskiWWxaA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 23, 2026 at 4:45 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2026 at 1:01 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > I came across a race condition in pg_get_publication_tables with
> > concurrent DROP TABLE. pg_get_publication_tables collects table OIDs
> > without locks on the first call, then opens each table on later calls.
> > If a table is dropped in between, the function errors with "could not
> > open relation with OID".
> >
>
> I agree with the problem statement, this is a weird error:
>
> postgres=# select * from pg_publication_tables;
> ERROR: could not open relation with OID 16390
>
> > This is common in environments where many tables are being created and
> > dropped while pg_publication_tables is queried, such as with FOR ALL
> > TABLES publications.
> > Please find the attached patch that fixes this by skipping
> > concurrently dropped tables instead of erroring out. Tables created
> > after the list is built are simply not present in the result set,
> > which is expected point-in-time behavior with no error.
>
> I too think that this should be fixed by skipping the dropped table.
> Will reveiw patch soon.
>

Bharath, I reviewed the patch. I personally think that manually
incrementing the call counter of SRF (funcctx->call_cntr++) in
pg_get_publication_tables() is not a good idea. I think these are
read-only for us and any changes to SRF fields must use SRF macros.

I tried to find if any other code-part does that, found one refernce
in hstore_svals():

/* ugly ugly ugly. why no macro for this? */
(funcctx)->call_cntr++;

Having said that, I could not find any other way to implement the fix
also. Did you try exploring 'SRF_RETURN_NEXT_NULL' in this case? I am
not very sure about this as well, as it will end up retruning NULL and
may impact output and its usage in tablesync too.

Let's see what others have to say on this fix.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-04-24 05:59:46 Re: Support EXCEPT for TABLES IN SCHEMA publications
Previous Message Peter Smith 2026-04-24 05:32:24 DOCS - typos and grammar issues across logical replication docs.