| From: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
|---|---|
| To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
| Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_createsubscriber: Fix incorrect handling of cleanup flags |
| Date: | 2026-05-12 10:56:29 |
| Message-ID: | CABdArM4gOAVKayZvTa-pyyCY7519zWe7Hx4cg_enEr32JdS6xQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, May 12, 2026 at 7:33 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Tue, May 6, 2025 at 3:41 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > On Mon, May 5, 2025 at 11:39 AM David G. Johnston
> > <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> > >
> > > On Sun, May 4, 2025 at 8:45 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >>
> > >> Attached is the patch implementing the above proposed solution.
> > >> Reviews and feedback are most welcome.
> > >
> > >
> > > I feel like this is just papering over the issue - which is that these two drop functions are being used for multiple differently named publications/slots yet take no care to ensure they only change made_publication and made_repslot if the name of the object being passed in matches the name of the object those two booleans are specifically tracking (the application created objects on the publisher).
> > >
> > > Make it so they are only changed to false if the name matches the one the program created and the connection is the primary connection. That targets the real issue and avoids using a branching boolean parameter.
> > >
> > > It seems really odd to say: if (in_cleanup) "don't try again" - since by definition this is the last thing we are doing before we exit. So really what this patch can do more simply is just remove the dbinfo->made_replslot=false and *made_publication=false lines altogether - which might be a valid option.
> > >
> >
> > +1 to removing the dbinfo->made_replslot=false and
> > *made_publication=false lines. In my tests, I attempted to force
> > multiple failures, but couldn’t find any case where
> > cleanup_objects_atexit() would recurse or cause repeated cleanup if
> > these flags remain set to true.
> >
> > > I'm partial to the latter really, I don't think the error message output for retrying a drop that may have previously failed would be an issue.
> > >
> >
> > As of now, we don’t attempt to drop the same object more than once, so
> > the latter approach does seem reasonable to me. That said, I’m unsure
> > why the flags were being reset here in the first place.
> >
>
> Hi Nisha,
>
Hi Peter, thank you for looking into it.
> From what I can tell, this flag resetting might have originated from
> old review comments of mine [1 #5b] and also [2 #6].
>
These flag resettings were already present since commit d44032d[1],
which introduced pg_createsubscriber. The changes discussed in your
comments for commit e5aeed4[2] did not modify their behavior.
> Apparently, it wasn't so much about preventing multiple errors for
> fdrops that had "previously failed"... it was more intended to prevent
> attempting to delete the same publication 2x (1st successfully, then
> 2nd time failing because it was already gone)
>
> Anyway, the code has probably changed lots since those review
> comments; I have no idea if my suspected double-delete at that time is
> still possible or not. Those old review comments might give you some
> clues on how to reproduce now that you have removed the flag
> resetting.
>
Let me first briefly explain why this patch still holds:
The made_publication and made_replslot are publisher-side cleanup
flags. They are checked only in cleanup_objects_atexit(), which
executes at most once and connects to the publisher to remove objects
created by the tool. They have no inherent relationship to the
subscriber.
That said, check_and_drop_publications() always operates on the
subscriber, so resetting made_publication because of a failure on the
subscriber side, when that same flag is later used for publisher-side
cleanup, does not seem correct to me as it could incorrectly skip the
required cleanup. A failure on one server should not affect cleanup
decisions for another server. The same is true for the made_replslot
flag in respective code.
After looking further, I noticed that since a recent commit
85ddcc2[3], check_and_drop_publications() also reads made_publication
to decide whether the subscriber-side inherited publication should be
dropped or preserved. In that sense, the dual usage of the flag looks
valid and continues to work correctly with my patch.
Also, after checking on pg_head, I don’t think there is a possibility
of a double-drop on the subscriber side. Either all publications are
dropped in the if (drop_all_pubs) block when "--remove=publications"
is specified, or by default the else block drops only dbinfo->pubname
(the FOR ALL TABLES publication). The if (made_publication) condition
simply guards against dropping a pre-existing publication. I don’t see
an issue there, but please let me know if I’m missing something.
[1] https://github.com/postgres/postgres/commit/d44032d
[2] https://github.com/postgres/postgres/commit/e5aeed4
[3] https://github.com/postgres/postgres/commit/85ddcc2
--
Thanks,
Nisha
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Antonin Houska | 2026-05-12 11:08:20 | Re: Adding REPACK [concurrently] |
| Previous Message | Yugo Nagata | 2026-05-12 10:30:11 | Re: Infinite Autovacuum loop caused by failing virtual generated column expression |