From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-09-17 16:53:37 |
Message-ID: | CAD21AoALaRUZkec7+XL_vFn0=wW8UbObS=FhymUK=zOeHxTMow@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Sep 17, 2025 at 4:19 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Sep 16, 2025 at 11:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Sep 16, 2025 at 1:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > When user is dropping a temporary slot, we should disable the
> > > decoding. The lazy behaviour should be for ERROR or session_exit
> > > cases.
> >
> > I think it might be worth discussing whether to use lazy behavior in
> > all cases.
> >
>
> Agreed.
>
> > There are several advantages:
> >
> > - It mitigates the risk of connection timeouts during a logical slot
> > drop or a subscription drop.
> > - In scenarios involving frequent creation and deletion of logical
> > slots (such as during initial data synchronization), it could
> > potentially avoid the issue of a frequent switch on and off.
> >
> > On the other hand, drawbacks are:
> >
> > - users would have to wait for effective_wal_level to get decreased to
> > 'replica' somehow.
> > - makes the checkpointer more busy in addition to its checkpointing job.
> > - it could take a longer time to disable logical decoding if the
> > checkpoint is busy with a checkpointing job.
> >
>
> This last point in drawback could hurt performance of systems for a
> longer time when that was really not required. It should be okay to
> use lazy behavior in all cases when we can do that in a predictable
> time.
Agreed.
If we use the lazy behavior in ERROR or session_exit cases, we would
have these drawbacks anyway. But assuming it won't happen frequently
in practice, we can live with that.
> The other background process to consider doing lazy processing
> is the launcher whose role is to launch apply workers for subscription
> and maintain a conflict_slot (if required). Now, because disabling
> logical_info could also take longer time in worst cases, the
> launcher's own tasks can become unpredictable. Also, if tomorrow, we
> decide to support dynamically changing wal_level from minimal to some
> upper level, the launcher won't be the appropriate process.
Right. Also, we don't launch the launcher process when
max_logical_replication_workers == 0. It should be >0 on the
subscriber but might not be on the publisher.
>
> The other idea could be to have a new auxiliary process to disable
> logical_info lazily. It is arguable if we just have a separate process
> for this purpose but we have previously discussed some other tasks for
> such a process like removal of old_serialized_snapshots and
> old_logical_ rewrite_map files. See [1]. If we agree to have a
> separate process for this purpose then disabling logical_info in all
> cases sounds okay to me.
Yeah, the custodian worker would be one solution. But please refer to
subsequent discussions[1][2]; there might not be other tasks to
delegate to the custodian worker than this logical decoding
deactivation, and it might be not optimal to have a single worker that
is responsible for all custodian works. Actually we've discussed a
similar idea on this thread and I drafted a patch[3] that utilizes
bgworkers to do internal tasks in the background in a
one-task-per-one-worker manner.
It requires more discussion anyway if we want to go with this
direction. I think we can start with using lazy behavior in ERROR or
session_exit cases (assuming it won't happen frequently in practice),
and consider using lazy behavior other cases if it's really
preferable.
Regards,
[1] https://www.postgresql.org/message-id/1058306.1680467858%40sss.pgh.pa.us
[2] https://www.postgresql.org/message-id/20230402184226.kkjplqvqu6utvzbt%40awork3.anarazel.de
[3] https://www.postgresql.org/message-id/CAD21AoCPc%2BpEgb0pJeiS2CU39ad8VW-10Ze7Uii%3D1RRjfgQ0uw%40mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-09-17 17:17:30 | Re: Parallel heap vacuum |
Previous Message | Fujii Masao | 2025-09-17 16:52:46 | Re: Suggestion to add --continue-client-on-abort option to pgbench |