Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date: 2026-05-12 21:20:14
Message-ID: CAD21AoC2oZKm4nyqb4McYMXvEBSx5RNv9ZWnYRKESmscoixTMQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sun, May 3, 2026 at 7:35 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Wed, Apr 29, 2026 at 12:15 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Fixed. Please find the attached v5 patch.
> >
> > The fix is needed only for PG16 and later, not PG15 or PG14. The bug
> > was introduced by b7ae03953690 [1] in PG16, which added a table_open()
> > call in pg_get_publication_tables(). PG15 and earlier only use
> > get_rel_namespace() and syscache lookups, both of which gracefully
> > handle dropped relations (returning InvalidOid/false rather than
> > erroring).
> >
> > I verified the bug and the fix on all affected branches. Please find
> > the attached version-specific patches for backpatching. Thank you!
> >
> > [1] b7ae03953690 - Ignore dropped and generated columns from the column list
>
> Please find the attached v6 patch, which fixes a test failure on
> FreeBSD. This variant of the build forces parallel query via
> debug_parallel_query=regress, causing the pg_get_publication_tables
> injection point to fire in parallel workers as well. I disabled forced
> parallel query for this test case.
>

I reviewed the patch and here are some comments:

+/* State for pg_get_publication_tables SRF */
+typedef struct
+{
+ List *table_infos; /* list of published_rel */
+ int curr_idx; /* current index into table_infos */
+} publication_tables_state;

I think we can define publication_table_state in
pg_get_publication_tables() as it's used only in that function.

---
+ /* Skip if the relation has been concurrently dropped. */
+ if (!OidIsValid(schemaid))
+ continue;

Although this check is done for all relations in table_infos, we also
check the return value of try_table_open(), and these two checks have
the same comment. I think we need more comments on why these two
checks are required.

In which case is schemaid InvalidOid and do we not call
try_table_open() (i.e., nulls[2] is false)?

It might make more sense to get-and-check the schemaid of each
relation when adding the table information to table_infos.

---
Looking at the regression tests in the patch, it tests the ALL TABLES
publication cases and the concurrently-dropped table is handled when
we call check the return value of try_table_open() but not
get_rel_namespace(). I think the test case itself is fine but we can
do the same test without adding a new injection point. For instance,

backend-1: begin;
backend-2: begin; lock table t_dropme in access exclusive mode;
backend-1: select * from pg_publication_tables;
backend-2: drop table t_dropme; commit;
backend-1: get an error "could not open relation with XXX"

---
If we use tuplestore instead of SRF, can we simplify the code as we
would not need publication_tables_state and the above check? It would
be only for the master, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-05-12 21:20:25 Re: [PATCH] libpq: try all addresses for a host before moving to next on target_session_attrs mismatch
Previous Message Zsolt Parragi 2026-05-12 21:05:45 Re: Feature: Use DNS SRV records for connecting