| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE |
| Date: | 2026-06-15 23:21:31 |
| Message-ID: | CALj2ACVc3G118VfMtS04JwE0c0Y_vsVwh3LVAMAVgBR6qNFo8g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Tue, May 12, 2026 at 2:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I reviewed the patch and here are some comments:
Thank you for reviewing!
> +/* State for pg_get_publication_tables SRF */
> +typedef struct
> +{
> + List *table_infos; /* list of published_rel */
> + int curr_idx; /* current index into table_infos */
> +} publication_tables_state;
>
> I think we can define publication_table_state in
> pg_get_publication_tables() as it's used only in that function.
Done for pre-HEAD versions. For HEAD, I used the SRF approach.
> ---
> + /* Skip if the relation has been concurrently dropped. */
> + if (!OidIsValid(schemaid))
> + continue;
>
> Although this check is done for all relations in table_infos, we also
> check the return value of try_table_open(), and these two checks have
> the same comment. I think we need more comments on why these two
> checks are required.
When get_rel_namespace() returns InvalidOid for the concurrently
dropped tables, it costs additional syscache lookups plus index scans
(cache miss). So, there's no harm from the correctness perspective.
However, I would like to optimize this part of the code by 1/ gating
it with InvalidOid checkup, 2/ moving the schemaid fetch closer to
where it's being used i.e. inside if (!pub->alltables). I would like
to do this only for HEAD, so, I attached it as a separate 0002 patch.
For pre-HEAD versions, I removed this check and retained the
try_table_open() fix.
> In which case is schemaid InvalidOid and do we not call
> try_table_open() (i.e., nulls[2] is false)?
This case is NOT possible, because the drop table ensures dependent
catalog entries (pg_publication_namespace in this case) are deleted
too.
> It might make more sense to get-and-check the schemaid of each
> relation when adding the table information to table_infos.
That won't help fix the table_open() error with concurrent table drops.
> ---
> Looking at the regression tests in the patch, it tests the ALL TABLES
> publication cases and the concurrently-dropped table is handled when
> we call check the return value of try_table_open() but not
> get_rel_namespace(). I think the test case itself is fine but we can
> do the same test without adding a new injection point. For instance,
>
> backend-1: begin;
> backend-2: begin; lock table t_dropme in access exclusive mode;
> backend-1: select * from pg_publication_tables;
> backend-2: drop table t_dropme; commit;
> backend-1: get an error "could not open relation with XXX"
Nice! It works. I used this for the test-case in the new patches.
> ---
> If we use tuplestore instead of SRF, can we simplify the code as we
> would not need publication_tables_state and the above check? It would
> be only for the master, though.
I implemented this idea for HEAD and it simplifies the code a bit.
Please find the attached v7 patches.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v7-0001-Fix-pg_get_publication_tables-race-with-concurren.patch | application/x-patch | 13.6 KB |
| v7-0002-Optimize-schema-lookup-in-pg_get_publication_tabl.patch | application/x-patch | 2.4 KB |
| v7-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.patch | application/x-patch | 7.3 KB |
| v7-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.patch | application/x-patch | 7.0 KB |
| v7-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.patch | application/x-patch | 7.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | surya poondla | 2026-06-15 23:22:06 | Re: [BUG] Take a long time to reach consistent after pg_rewind |
| Previous Message | Jacob Champion | 2026-06-15 23:20:02 | Re: Heads Up: cirrus-ci is shutting down June 1st |