From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-08-01 23:23:16 |
Message-ID: | CAD21AoB=Rf-SASOJR2WqvWcrA5Q3S2oUBACVLdJPaA8x6EchBA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 31, 2025 at 5:00 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> > I thought we could fix this issue by checking the number of in-use
> > logical slots while holding ReplicationSlotControlLock and
> > LogicalDecodingControlLock, but it seems we need to deal with another
> > race condition too between backends and startup processes at the end
> > of recovery.
> >
> > Currently the backend skips controlling logical decoding status if the
> > server is in recovery (by checking RecoveryInProgress()), but it's
> > possible that a backend process tries to drop a logical slot after the
> > startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
> > before accepting writes.
>
> Right. I also verified on local and found that
> ReplicationSlotDropAcquired()->DisableLogicalDecodingIfNecessary() sometimes
> skips to modify the status because RecoveryInProgress is still false.
>
> > In this case, the backend ends up not
> > disabling logical decoding and it remains enabled. I think we would
> > somehow need to delay the logical decoding status change in this
> > period until the recovery completes.
>
> My primitive idea was to 1) keep startup acquiring the lock till end of recovery
> and 2) DisableLogicalDecodingIfNecessary() acquires lock before checking the
> recovery status, but it could not work well. Not sure but WaitForProcSignalBarrier()
> stucked if the process acquired LogicalDecodingControlLock lock....
I think that it's not realistic to keep holding a lwlock until the
recovery actually completes because we perform a checkpoint after
that.
In the latest version patch I attached, I introduce a flag on shared
memory to delay any logical decoding status change until the recovery
completes. The implementation got more complex than I expected but I
don't have a better idea. I'm open to other approaches. Also, I
incorporated all comments I got so far[1][2][3] and updated the
documentation.
Regards,
[1] https://www.postgresql.org/message-id/CALDaNm3BfG1hpWVEaqwBgXpcEGSQXDi536OzB2%3D8SFTz-v%2B3CA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJpy0uDxap0YKLx5N45_Vz49QARjioUaOb1qpaiV0PBkYoivRg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/OSCPR01MB149663D242F6E97630758DD6EF55AA%40OSCPR01MB14966.jpnprd01.prod.outlook.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Enable-logical-decoding-dynamically-based-on-logi.patch | application/octet-stream | 85.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-08-01 23:24:51 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Previous Message | Masahiko Sawada | 2025-08-01 23:12:15 | Re: Add backup_type to pg_stat_progress_basebackup |