Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-09-16 18:18:29
Message-ID: CAD21AoA77mhD5j2bGR2gJ2TzvBQ+=6ZuepWuzZPKfZJRTpEArg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 16, 2025 at 1:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 15, 2025 at 10:15 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Sep 14, 2025 at 7:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Sep 12, 2025 at 11:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Sep 11, 2025 at 9:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > >
> > > > > For the shutdown sequence, can't we think of resetting effective_wal
> > > > > after a restart?
> > > >
> > > > Does it mean that effective_wal_level keeps 'logical' until the next
> > > > server starts?
> > > >
> > >
> > > Yes, IIUC, effective_wal_level is anyway a derived value based on
> > > current wal_level and presence of logical slots. So, what will be the
> > > impact if it is not accurate at shutdown?
> >
> > I think there won't be an impact at shutdown time. I would rather be
> > concerned that such behavior could confuse users. I think it would not
> > be a rare situation where users enable and disable logical decoding by
> > creating and dropping a temporary slot. If we keep effective_wal_level
> > 'logical' in this case, users would want to somehow disable logical
> > decoding as it could have a negative performance impact.
> >
>
> When user is dropping a temporary slot, we should disable the
> decoding. The lazy behaviour should be for ERROR or session_exit
> cases.

I think it might be worth discussing whether to use lazy behavior in
all cases. There are several advantages:

- It mitigates the risk of connection timeouts during a logical slot
drop or a subscription drop.
- In scenarios involving frequent creation and deletion of logical
slots (such as during initial data synchronization), it could
potentially avoid the issue of a frequent switch on and off.

On the other hand, drawbacks are:

- users would have to wait for effective_wal_level to get decreased to
'replica' somehow.
- makes the checkpointer more busy in addition to its checkpointing job.
- it could take a longer time to disable logical decoding if the
checkpoint is busy with a checkpointing job.

What do you think?

>
> > There would
> > be two ways for users to change it to 'replica': restart the server or
> > create and drop a logical slot again.
> >
>
> If we do the lazy work during the checkpoint then they can perform the
> checkpoint command.

Right.

>
> On the other hand, for users who
> > dropped a non-temporary logical slot without an error or dropped the
> > non-last temporary slot, logical decoding is disabled without other
> > manual interventions. It could be pretty hard to assess the situation,
> > resulting in having users always checking effective_wal_level after
> > dropping a logical slot and doing extra steps to make the
> > effective_wal_level 'replica'.
> >
>
> When the last slot is dropped, anyway, users won't be able to perform
> any decoding. Do you mean that they want to know whether logical_wal
> is still being recorded? If so, then checking effective_wal_level
> would be the way.

I think the situation that users would want to avoid is that the
logical decoding is enabled (therefore writing logical_wal) even when
they don't want to use logical decoding because it means the system is
paying unnecessary costs in terms of writing logical_wal. It would not
be a problem if we can ensure that logical decoding is eventually
disabled in a reasonably short time in any case using lazy behavior.
On the other hand, I think it would not be a good user experience if
it's required for users to restart the server or do other manual
interventions in some specific scenarios in order to disable logical
decoding.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-09-16 18:30:22 Re: POC: Parallel processing of indexes in autovacuum
Previous Message Peter Geoghegan 2025-09-16 17:59:07 Re: PG 18 release notes draft committed