Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-10-17 12:07:18
Message-ID: CAJpy0uDtJERSqX7_G2s8r826qjwD61FNSfBiwq+tsXiPyevEcQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 17, 2025 at 12:47 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Oct 16, 2025 at 9:07 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 17, 2025 at 8:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Oct 16, 2025 at 11:10 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Oct 16, 2025 at 1:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > >
> > > > Using PMSIGNAL_BACKGROUND_WORKER_CHANGE sounds mis-using since the
> > > > slotsync worker is not a background worker nor logical decoding
> > > > activation is not related to bgworkers.
> > > >
> > > > An alternative idea is to launch the slotsync worker if wal_level
> > > > value on the standby is >=replica, that is, always launch it on the
> > > > standby if sync_replication_slots is on. Even with and without the
> > > > patch, we don't shutdown the slotsync worker even if logical decoding
> > > > gets disabled on the standby.
> > > >
> > >
> > > Are you talking about the case when wal_level on primary has reduced
> > > below logical and user will get the following message on standby:
> > > "logical decoding on standby requires \"wal_level\" >= \"logical\" on
> > > the primary"? If so, the slight difference in this case is that
> > > standby still has wal_level logical.
> > >
> >
> > I believe what Sawada-san meant is that even when effective_wal_level
> > = replica on a standby, we should still allow the slot-sync worker to
> > start if 'sync_replication_slots' is enabled. This is because we
> > currently do not stop the worker when effective_wal_level on the
> > standby changes from logical to replica, so allowing it to start in
> > this case maintains consistent behavior. That said, my preference is
> > to not start the slot-sync worker if effective_wal_level is less than
> > logical. As I understand, this is already the behavior implemented in
> > the current patch.
>
> Exactly. Thank you for clarifying my comment.
>
> >
> > Regarding the scenario where effective_wal_level changes from logical
> > to replica on a standby, my vote is to explicitly shut down the
> > slot-sync worker in such cases. I don't see any benefit in keeping it
> > running. But this can be handled in a separate patch as it is not
> > directly concerned with this patch.
>
> If the last logical slot on the primary is a failover slot,
> STATUS_CHANGE with logical_decoding=false could reach the standby
> before the slotsync worker drops the corresponding slot. In this case,
> if we shutdown the slotsync upon replaying that WAL record, the synced
> (and invalidated) slot could remain. It might be one potential benefit
> that we keep the slotsync worker running even when wal_level='replica'
> (at least until one more synchronization cycle is done).
>

Okay, I will think more on this.

> >
> > Next is, when effective_wal_level changes from replica to logical,
> > should we wake up the postmaster to immediately start the slot-sync
> > worker? My vote is yes, but if implementing this introduces too much
> > complexity, especially considering it's a rare scenario, we could
> > leave it as is. In that case, the slot-sync worker would still start,
> > but possibly with a delay of up to 1-2 minutes when the postmaster is
> > sleeping.
>
> After checking other codes, I found that we simply send SIGUSR1 to the
> postmaster in pg_promote(). I think we can use it.
>

Okay.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-10-17 12:13:08 Re: misleading error message in ProcessUtilitySlow T_CreateStatsStmt
Previous Message Nazir Bilal Yavuz 2025-10-17 12:06:06 Re: Unused stricture field in xlogreader:DecodedBkpBlock