From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-07-30 17:54:14 |
Message-ID: | CAD21AoDFkWxeG6bX1EkGY9=i6P0Xz-PCrw41XNFFGfJXaft4eA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 30, 2025 at 12:22 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> While reading more, I found a race condition.
Thank you for reviewing the patch!
> In this case the effective_wal_level
> can be logical even when there is no logical slot.
> UpdateLogicalDecodingStatusEndOfRecovery() checks the number of slots of the logical
> slot then release the lock once. Then startup process acquires the lock once and
> compare with IsLogicalDecodingEnabled(), then update the status afterward if needed.
> So, wal_level can be inconsistent if the status is changed after the n_logical_slots
> is read.
>
> Steps:
> a) constructed a primary-standby system
> b) createad a logical slot on the primary
> c) createad a logical slot on the standby
> d) sent a promote signal to standby
> e) dropped a logical slot on standby, just after startup process released
> LogicalDecodingControlLock in UpdateLogicalDecodingStatusEndOfRecovery().
>
> After the above, effective_wal_level was keep turning on. Is it the expected behavior?
No, we need to fix it.
I thought we could fix this issue by checking the number of in-use
logical slots while holding ReplicationSlotControlLock and
LogicalDecodingControlLock, but it seems we need to deal with another
race condition too between backends and startup processes at the end
of recovery.
Currently the backend skips controlling logical decoding status if the
server is in recovery (by checking RecoveryInProgress()), but it's
possible that a backend process tries to drop a logical slot after the
startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
before accepting writes. In this case, the backend ends up not
disabling logical decoding and it remains enabled. I think we would
somehow need to delay the logical decoding status change in this
period until the recovery completes.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-07-30 18:05:48 | Re: vacuumdb changes for stats import/export |
Previous Message | Jeff Davis | 2025-07-30 17:44:00 | Re: vacuumdb changes for stats import/export |