RE: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: RE: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-07-31 12:00:16
Message-ID: OSCPR01MB1496686BCD0C40745BB03BBB3F527A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Sawada-san,

> I thought we could fix this issue by checking the number of in-use
> logical slots while holding ReplicationSlotControlLock and
> LogicalDecodingControlLock, but it seems we need to deal with another
> race condition too between backends and startup processes at the end
> of recovery.
>
> Currently the backend skips controlling logical decoding status if the
> server is in recovery (by checking RecoveryInProgress()), but it's
> possible that a backend process tries to drop a logical slot after the
> startup process calling UpdateLogicalDecodingStatusEndOfRecovery() and
> before accepting writes.

Right. I also verified on local and found that
ReplicationSlotDropAcquired()->DisableLogicalDecodingIfNecessary() sometimes
skips to modify the status because RecoveryInProgress is still false.

> In this case, the backend ends up not
> disabling logical decoding and it remains enabled. I think we would
> somehow need to delay the logical decoding status change in this
> period until the recovery completes.

My primitive idea was to 1) keep startup acquiring the lock till end of recovery
and 2) DisableLogicalDecodingIfNecessary() acquires lock before checking the
recovery status, but it could not work well. Not sure but WaitForProcSignalBarrier()
stucked if the process acquired LogicalDecodingControlLock lock....

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2025-07-31 12:49:38 Re: Pathify RHS unique-ification for semijoin planning
Previous Message Pavel Luzanov 2025-07-31 11:04:49 Re: Eagerly scan all-visible pages to amortize aggressive vacuum