Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-08-01 23:23:16
Message-ID: CAD21AoB=Rf-SASOJR2WqvWcrA5Q3S2oUBACVLdJPaA8x6EchBA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 31, 2025 at 5:00 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > I thought we could fix this issue by checking the number of in-use
> > logical slots while holding ReplicationSlotControlLock and
> > LogicalDecodingControlLock, but it seems we need to deal with another
> > race condition too between backends and startup processes at the end
> > of recovery.
> >
> > Currently the backend skips controlling logical decoding status if the
> > server is in recovery (by checking RecoveryInProgress()), but it's
> > possible that a backend process tries to drop a logical slot after the
> > startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
> > before accepting writes.
>
> Right. I also verified on local and found that
> ReplicationSlotDropAcquired()->DisableLogicalDecodingIfNecessary() sometimes
> skips to modify the status because RecoveryInProgress is still false.
>
> > In this case, the backend ends up not
> > disabling logical decoding and it remains enabled. I think we would
> > somehow need to delay the logical decoding status change in this
> > period until the recovery completes.
>
> My primitive idea was to 1) keep startup acquiring the lock till end of recovery
> and 2) DisableLogicalDecodingIfNecessary() acquires lock before checking the
> recovery status, but it could not work well. Not sure but WaitForProcSignalBarrier()
> stucked if the process acquired LogicalDecodingControlLock lock....

I think that it's not realistic to keep holding a lwlock until the
recovery actually completes because we perform a checkpoint after
that.

In the latest version patch I attached, I introduce a flag on shared
memory to delay any logical decoding status change until the recovery
completes. The implementation got more complex than I expected but I
don't have a better idea. I'm open to other approaches. Also, I
incorporated all comments I got so far[1][2][3] and updated the
documentation.

Regards,

[1] https://www.postgresql.org/message-id/CALDaNm3BfG1hpWVEaqwBgXpcEGSQXDi536OzB2%3D8SFTz-v%2B3CA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJpy0uDxap0YKLx5N45_Vz49QARjioUaOb1qpaiV0PBkYoivRg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/OSCPR01MB149663D242F6E97630758DD6EF55AA%40OSCPR01MB14966.jpnprd01.prod.outlook.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v5-0001-Enable-logical-decoding-dynamically-based-on-logi.patch application/octet-stream 85.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-08-01 23:24:51 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Masahiko Sawada 2025-08-01 23:12:15 Re: Add backup_type to pg_stat_progress_basebackup