RE: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-08-28 02:45:03
Message-ID: OSCPR01MB14966E989331F1FA7AF06BD9BF53BA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Sawada-san,

> > Assuming that logical_decoding written in the WAL is false here, and a logical
> > replication slot is created just after that. In my experiments below happened:
> >
>
> Let me clarify each step:
>
> > 1. startup process updated logical_decoding_enabled to false, at line 8652.
>
> I assume that logical_decoding_enabled was enabled before step 1.

Right. Initially logical replication slot exist on both primary and standby.
More detail; the standby slot was created by the slotsync worker.

> > 2. slotsync worker started to sync. Surprisingly, it created a (second) logical
> > slot and started logical decoding with fast_foward mode.
>
> I guess that the postmaster launched the slotsync worker before the
> startup changes the status since logical decoding was enabled as I
> mentioned above, which seems fine to me.

As you said, the slotsync worker has already been launched when the status is
changed. I felt logical slot should not be created after the status on the shared
memory is changed.

> > 3. startup invalidated logical slots due to the wal_level. the slot created at
> > step2 was automatically dropped, because it was not sync-readly yet.
> > 4. startup process shut down the slotsync worker.
> > 5. start process read the STATUS_CHANGE record again, which has the value
> "true".
> > it requested to restart the sync worker.
> > 6. restarted sync worker synchronize the slot again...
> >
> > For me it works well but it is bit a strange because 1) logical decoding is
> > started even when effective_wal_level is false,
>
> I think it's a race condition between the postmaster and the startup,
> it could happen even between the backend and the startup; the startup
> disables logical decoding right after the backend passes
> CheckLogicalDecodingRequirements() check. I think it's technically
> okay since all WAL records before the STATUS_CHANGE should have the
> logical information. Even if it starts to do logical decoding, it
> would end up decoding the STATUS_CHANGE record and with an error (see
> xlog_decode()).

To clarify, are you thinking that it is no need to be fixed, because eventually
the system becomes the appropriate state, right?

> > and 2) the synced slot is
> > dropped once with below message:
> >
> > ```
> > LOG: terminating process 1474448 to release replication slot "test2"
> > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" or at
> least one logical slot on the primary server.
> > CONTEXT: WAL redo at 0/030000B8 for
> XLOG/LOGICAL_DECODING_STATUS_CHANGE: false
> > ERROR: canceling statement due to conflict with recovery
> > DETAIL: User was using a logical replication slot that must be invalidated.
> > ```
> >
> > Can we stop the sync worker before updating the status? IIUC this is one of the
> > solution.
>
> I think it would lead to another race condition; the slotsync worker
> can start again before updating the status.

Hmm, okay.

Another small comment: this data structure is not used in other files, no need to set extern.

```
extern LogicalDecodingCtlData *LogicalDecodingCtl;
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-08-28 03:13:49 Re: Inconsistent update in the MERGE command
Previous Message Zhijie Hou (Fujitsu) 2025-08-28 02:32:06 RE: Conflict detection for update_deleted in logical replication