Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE
Date: 2026-06-15 23:21:31
Message-ID: CALj2ACVc3G118VfMtS04JwE0c0Y_vsVwh3LVAMAVgBR6qNFo8g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, May 12, 2026 at 2:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I reviewed the patch and here are some comments:

Thank you for reviewing!

> +/* State for pg_get_publication_tables SRF */
> +typedef struct
> +{
> + List *table_infos; /* list of published_rel */
> + int curr_idx; /* current index into table_infos */
> +} publication_tables_state;
>
> I think we can define publication_table_state in
> pg_get_publication_tables() as it's used only in that function.

Done for pre-HEAD versions. For HEAD, I used the SRF approach.

> ---
> + /* Skip if the relation has been concurrently dropped. */
> + if (!OidIsValid(schemaid))
> + continue;
>
> Although this check is done for all relations in table_infos, we also
> check the return value of try_table_open(), and these two checks have
> the same comment. I think we need more comments on why these two
> checks are required.

When get_rel_namespace() returns InvalidOid for the concurrently
dropped tables, it costs additional syscache lookups plus index scans
(cache miss). So, there's no harm from the correctness perspective.

However, I would like to optimize this part of the code by 1/ gating
it with InvalidOid checkup, 2/ moving the schemaid fetch closer to
where it's being used i.e. inside if (!pub->alltables). I would like
to do this only for HEAD, so, I attached it as a separate 0002 patch.
For pre-HEAD versions, I removed this check and retained the
try_table_open() fix.

> In which case is schemaid InvalidOid and do we not call
> try_table_open() (i.e., nulls[2] is false)?

This case is NOT possible, because the drop table ensures dependent
catalog entries (pg_publication_namespace in this case) are deleted
too.

> It might make more sense to get-and-check the schemaid of each
> relation when adding the table information to table_infos.

That won't help fix the table_open() error with concurrent table drops.

> ---
> Looking at the regression tests in the patch, it tests the ALL TABLES
> publication cases and the concurrently-dropped table is handled when
> we call check the return value of try_table_open() but not
> get_rel_namespace(). I think the test case itself is fine but we can
> do the same test without adding a new injection point. For instance,
>
> backend-1: begin;
> backend-2: begin; lock table t_dropme in access exclusive mode;
> backend-1: select * from pg_publication_tables;
> backend-2: drop table t_dropme; commit;
> backend-1: get an error "could not open relation with XXX"

Nice! It works. I used this for the test-case in the new patches.

> ---
> If we use tuplestore instead of SRF, can we simplify the code as we
> would not need publication_tables_state and the above check? It would
> be only for the master, though.

I implemented this idea for HEAD and it simplifies the code a bit.

Please find the attached v7 patches.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v7-0001-Fix-pg_get_publication_tables-race-with-concurren.patch application/x-patch 13.6 KB
v7-0002-Optimize-schema-lookup-in-pg_get_publication_tabl.patch application/x-patch 2.4 KB
v7-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.patch application/x-patch 7.3 KB
v7-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.patch application/x-patch 7.0 KB
v7-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.patch application/x-patch 7.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message surya poondla 2026-06-15 23:22:06 Re: [BUG] Take a long time to reach consistent after pg_rewind
Previous Message Jacob Champion 2026-06-15 23:20:02 Re: Heads Up: cirrus-ci is shutting down June 1st