Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-07-30 17:54:14
Message-ID: CAD21AoDFkWxeG6bX1EkGY9=i6P0Xz-PCrw41XNFFGfJXaft4eA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 30, 2025 at 12:22 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> While reading more, I found a race condition.

Thank you for reviewing the patch!

> In this case the effective_wal_level
> can be logical even when there is no logical slot.
> UpdateLogicalDecodingStatusEndOfRecovery() checks the number of slots of the logical
> slot then release the lock once. Then startup process acquires the lock once and
> compare with IsLogicalDecodingEnabled(), then update the status afterward if needed.
> So, wal_level can be inconsistent if the status is changed after the n_logical_slots
> is read.
>
> Steps:
> a) constructed a primary-standby system
> b) createad a logical slot on the primary
> c) createad a logical slot on the standby
> d) sent a promote signal to standby
> e) dropped a logical slot on standby, just after startup process released
> LogicalDecodingControlLock in UpdateLogicalDecodingStatusEndOfRecovery().
>
> After the above, effective_wal_level was keep turning on. Is it the expected behavior?

No, we need to fix it.

I thought we could fix this issue by checking the number of in-use
logical slots while holding ReplicationSlotControlLock and
LogicalDecodingControlLock, but it seems we need to deal with another
race condition too between backends and startup processes at the end
of recovery.

Currently the backend skips controlling logical decoding status if the
server is in recovery (by checking RecoveryInProgress()), but it's
possible that a backend process tries to drop a logical slot after the
startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
before accepting writes. In this case, the backend ends up not
disabling logical decoding and it remains enabled. I think we would
somehow need to delay the logical decoding status change in this
period until the recovery completes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-07-30 18:05:48 Re: vacuumdb changes for stats import/export
Previous Message Jeff Davis 2025-07-30 17:44:00 Re: vacuumdb changes for stats import/export