| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE |
| Date: | 2026-04-24 05:49:40 |
| Message-ID: | CAJpy0uA8m+URZivj_SK9VdhQgW2sqJdEVCb9c_n34BskiWWxaA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Apr 23, 2026 at 4:45 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Thu, Apr 23, 2026 at 1:01 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > I came across a race condition in pg_get_publication_tables with
> > concurrent DROP TABLE. pg_get_publication_tables collects table OIDs
> > without locks on the first call, then opens each table on later calls.
> > If a table is dropped in between, the function errors with "could not
> > open relation with OID".
> >
>
> I agree with the problem statement, this is a weird error:
>
> postgres=# select * from pg_publication_tables;
> ERROR: could not open relation with OID 16390
>
> > This is common in environments where many tables are being created and
> > dropped while pg_publication_tables is queried, such as with FOR ALL
> > TABLES publications.
> > Please find the attached patch that fixes this by skipping
> > concurrently dropped tables instead of erroring out. Tables created
> > after the list is built are simply not present in the result set,
> > which is expected point-in-time behavior with no error.
>
> I too think that this should be fixed by skipping the dropped table.
> Will reveiw patch soon.
>
Bharath, I reviewed the patch. I personally think that manually
incrementing the call counter of SRF (funcctx->call_cntr++) in
pg_get_publication_tables() is not a good idea. I think these are
read-only for us and any changes to SRF fields must use SRF macros.
I tried to find if any other code-part does that, found one refernce
in hstore_svals():
/* ugly ugly ugly. why no macro for this? */
(funcctx)->call_cntr++;
Having said that, I could not find any other way to implement the fix
also. Did you try exploring 'SRF_RETURN_NEXT_NULL' in this case? I am
not very sure about this as well, as it will end up retruning NULL and
may impact output and its usage in tablesync too.
Let's see what others have to say on this fix.
thanks
Shveta
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shveta malik | 2026-04-24 05:59:46 | Re: Support EXCEPT for TABLES IN SCHEMA publications |
| Previous Message | Peter Smith | 2026-04-24 05:32:24 | DOCS - typos and grammar issues across logical replication docs. |