Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-09-05 23:21:02
Message-ID: CAD21AoAXZmk3os1c4Hf=Gzd82-rH=g6sPGRyhWC5i4eMQTxYrw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 5, 2025 at 3:15 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Sep 4, 2025 at 11:15 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > od On Tue, Sep 2, 2025 at 8:11 PM Hayato Kuroda (Fujitsu)
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Sawada-san,
> > >
> > > Here are my comments.
> > >
> > > 01.
> > > ```
> > > checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled();
> > > ```
> > >
> > > Per my analysis, the value is always false here because StartupLogicalDecodingStatus
> > > is not called yet. Can we use "false" directly?
> >
> > I think that it's better to read the shared flag instead of directly
> > setting false since LogicalDecodingCtl is already initialized.
> >
> > >
> > > 02.
> > > ```
> > > elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt);
> > > ```
> > >
> > > Here plural form is always used even if the running transaction is only one.
> > > How about something like:
> > > ```
> > > Number of transactions to wait finishing: %d
> > > ```
> >
> > Hmm, not sure it improves the message. I think we don't care much
> > about plurals in debug messages. And in our convention the main log
> > message doesn't start a capital character.
> >
> >
> > >
> > > 03.
> > > ```
> > > while (RecoveryInProgress())
> > > {
> > > pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY);
> > > pg_usleep(100000L); /* wait for 100 msec */
> > > pgstat_report_wait_end();
> > > }
> > > ```
> > >
> > > I found a stuck case here: if a backend process within the loop and startup waits
> > > a signal is processed, both of them can stuck. The backend waits the recovery
> > > state to be DONE, and the startup waits all processes handle consume the signal.
> > > IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier().
> > >
> > > Actual steps:
> > >
> > > 0. constructed a streaming replication system, which the only primary server had
> > > a logical slot. I.e., the effective_wal_level was logical.
> > > 1. connected to a standby node
> > > 2. attached to the backend process via gdb
> > > 3. added a breakpoint at create_logical_replication_slot
> > > 4. called pg_create_logical_replication_slot() on the backend.
> > > the backend will stop before ReplicationSlotCreate().
> > > 5. from another terminal, attached to the startup process via gdb
> > > 6. added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery()
> > > 7. from another terminal, send a promote signal to the standby.
> > > The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery()
> > > 8. executed steps on startup process, untill delay_status_change was updated
> > > and LogicalDecodingControlLock was released.
> > > 9. detached from the backend process. It would stop at the loop in
> > > start_logical_decoding_status_change().
> > > 10. detached from the startup process. It would wait all processes handled the
> > > signal, but the backend won't do.
> >
> > Good find! I'll fix the problem by adding CHECK_FOR_INTERRUPTS() as
> > you suggested.
>
> I've attached the updated patch that incorporated all comments I got so far.

Sorry, I've attached the wrong version. Please find the attached correct one.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v14-0001-Enable-logical-decoding-dynamically-based-on-log.patch application/octet-stream 91.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2025-09-06 00:17:03 Re: Extension security improvement: Add support for extensions with an owned schema
Previous Message Jeff Davis 2025-09-05 23:07:09 Re: Should io_method=worker remain the default?