| From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
|---|---|
| To: | Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Warn when creating or enabling a subscription with max_logical_replication_workers = 0 |
| Date: | 2026-02-06 07:28:06 |
| Message-ID: | CAHut+PtmxmkP4acNT0b8b49vySXSEAARvDFYuqbyBWBho4iH+g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Feb 6, 2026 at 3:37 PM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>
> On Fri, 6 Feb 2026 09:58:02 +1100
> Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> > On Thu, Feb 5, 2026 at 7:12 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Thursday, February 5, 2026 3:47 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > On Thu, Feb 5, 2026 at 12:12 PM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
> > > > >
> > > > > On Wed, 4 Feb 2026 17:26:25 +1100
> > > > > Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > >
> > ...
> > > >
> > > > Oh right, I mistook that you had run out of logical replication "workers", but in
> > > > fact, because max_logical_replication_workers = 0 the main "logical
> > > > replication launcher" process had failed to start, so logical replication was
> > > > entirely disabled.
> > > >
> > > > See code: in backend/replication/logical/launcher.c
> > > >
> > > > ApplyLauncherRegister(void)
> > > > {
> > > > ...
> > > > if (max_logical_replication_workers == 0 || IsBinaryUpgrade)
> > > > return;
> > > >
> > > > ~~~
> > > >
> > > > Given this, I felt that instead of testing the GUC, what you really want to know
> > > > is just whether that "logical replication launcher" is running or not.
> > > >
> > > > And that launcher pid is already tested when the Subscription commands send
> > > > a "kill" to the launcher. e.g. see function ApplyLauncherWakeup.
> > > >
> > > > So, here is a diff patch, of what I tried:
> > > >
> > > > ------
> > > > diff --git a/src/backend/replication/logical/launcher.c
> > > > b/src/backend/replication/logical/launcher.c
> > > > index 3ed86480be2..f880380ce4e 100644
> > > > --- a/src/backend/replication/logical/launcher.c
> > > > +++ b/src/backend/replication/logical/launcher.c
> > > > @@ -1195,6 +1195,13 @@ ApplyLauncherWakeup(void) {
> > > > if (LogicalRepCtx->launcher_pid != 0)
> > > > kill(LogicalRepCtx->launcher_pid, SIGUSR1);
> > > > + else
> > > > + {
> > > > + if (max_logical_replication_workers == 0)
> > > > + ereport(WARNING,
> > > > + errmsg("Logical replication is
> > > > currently disabled"),
> > > > + errhint("\"%s\" is 0.",
> > > > "max_logical_replication_workers"));
> > > > + }
> > > > }
> > > > ------
> > > >
> > > > Thoughts?
> > >
> > > I think this is not the right place to check this issue. The launcher might fail
> > > for some reasons and restart soon (pid will be set to 0), in which case this
> > > warning wouldn't be appropriate.
> >
> > AFAIK, that's not possible. My warning is guarded by checking
> > max_logical_replication_workers == 0. And in that case, the launcher
> > cannot "fail" because it was never registered/started in the first
> > place.
>
> I also initially considered emitting the warning in
> ApplyLauncherWakeup() after checking max_logical_replication_workers == 0.
> However, I think checking pid == 0 is sufficient.
>
> Even when max_logical_replication_workers is non-zero and the launcher is
> normally running, it could still be killed by user action or by the
> OS, although such cases should be rare. In that situation, emitting the
> same warning would not be appropriate.
Sorry, I did not understand the previous paragraph. If "when
max_logical_replication_workers is non-zero" then the warning cannot
be emitted inappropriately because that warning is guarded by "if
(max_logical_replication_workers == 0)" (??).
>
> I placed the logic in subscriptioncmds.c so that the warning message can
> better reflect the actual situation, while keeping consistency with
> existing messages such as "subscription was created, but is not
> connected".
In hindsgght your original patch is very similar to what I posted. I
agree, your messages are better because they are more specific. But
there are multiple calls to ApplyLauncherWakeup() -- not just those 2
that you handled - so I was just unsure if there are some remaining
holes.
>
> > >
> > > Besides, I also think it would make more sense to issue a warning if the
> > > subscription has no remaining workers to start instead of raising a
> > > warning for 0 setting (the latter seems rare).
> > >
> >
> > It might be rare, but by my understanding, the original post described
> > this specific scenario, whereby the user had previously deliberately
> > configured `max_logical_replication_workers` to 0. Then, some time
> > later, when they attempted CREATE/ALTER SUBSCRIPTION, nothing
> > happened, and there was only silence. If they'd forgotten about their
> > `max_logical_replication_workers` setting, then it could be confusing
> > why nothing was happening.
> >
> > OTOH, when max_logical_replication_workers > 0, then the logical
> > replication launcher would be running, and in that case, there are
> > already plenty of warning logs about not enough worker resources.
>
> Yes. In this case, warnings are emitted to the server log, but they do not
> appear as a response to CREATE/ALTER SUBSCRIPTION. There is an option to
> also emit such warnings during the CREATE/ALTER SUBSCRIPTION command, so
> I will update the patch accordingly.
>
> Nevertheless, this seems to be a different situation from the case where
> logical replication is disabled entirely, so I think the warning messages
> should be handled separately.
+1. I also think that running out of worker resources is a different
scenario from the entire logical replication being disabled, so it
should be handled separately
======
Kind Regards,
Peter Smith.
Fujitsu Australia
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-02-06 07:34:44 | Re: Concerns regarding code in pgstat_backend.c |
| Previous Message | Jakub Wartak | 2026-02-06 07:19:02 | Re: [PING] fallocate() causes btrfs to never compress postgresql files |