From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-08-28 02:45:03 |
Message-ID: | OSCPR01MB14966E989331F1FA7AF06BD9BF53BA@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Sawada-san,
> > Assuming that logical_decoding written in the WAL is false here, and a logical
> > replication slot is created just after that. In my experiments below happened:
> >
>
> Let me clarify each step:
>
> > 1. startup process updated logical_decoding_enabled to false, at line 8652.
>
> I assume that logical_decoding_enabled was enabled before step 1.
Right. Initially logical replication slot exist on both primary and standby.
More detail; the standby slot was created by the slotsync worker.
> > 2. slotsync worker started to sync. Surprisingly, it created a (second) logical
> > slot and started logical decoding with fast_foward mode.
>
> I guess that the postmaster launched the slotsync worker before the
> startup changes the status since logical decoding was enabled as I
> mentioned above, which seems fine to me.
As you said, the slotsync worker has already been launched when the status is
changed. I felt logical slot should not be created after the status on the shared
memory is changed.
> > 3. startup invalidated logical slots due to the wal_level. the slot created at
> > step2 was automatically dropped, because it was not sync-readly yet.
> > 4. startup process shut down the slotsync worker.
> > 5. start process read the STATUS_CHANGE record again, which has the value
> "true".
> > it requested to restart the sync worker.
> > 6. restarted sync worker synchronize the slot again...
> >
> > For me it works well but it is bit a strange because 1) logical decoding is
> > started even when effective_wal_level is false,
>
> I think it's a race condition between the postmaster and the startup,
> it could happen even between the backend and the startup; the startup
> disables logical decoding right after the backend passes
> CheckLogicalDecodingRequirements() check. I think it's technically
> okay since all WAL records before the STATUS_CHANGE should have the
> logical information. Even if it starts to do logical decoding, it
> would end up decoding the STATUS_CHANGE record and with an error (see
> xlog_decode()).
To clarify, are you thinking that it is no need to be fixed, because eventually
the system becomes the appropriate state, right?
> > and 2) the synced slot is
> > dropped once with below message:
> >
> > ```
> > LOG: terminating process 1474448 to release replication slot "test2"
> > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" or at
> least one logical slot on the primary server.
> > CONTEXT: WAL redo at 0/030000B8 for
> XLOG/LOGICAL_DECODING_STATUS_CHANGE: false
> > ERROR: canceling statement due to conflict with recovery
> > DETAIL: User was using a logical replication slot that must be invalidated.
> > ```
> >
> > Can we stop the sync worker before updating the status? IIUC this is one of the
> > solution.
>
> I think it would lead to another race condition; the slotsync worker
> can start again before updating the status.
Hmm, okay.
Another small comment: this data structure is not used in other files, no need to set extern.
```
extern LogicalDecodingCtlData *LogicalDecodingCtl;
```
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Chao Li | 2025-08-28 03:13:49 | Re: Inconsistent update in the MERGE command |
Previous Message | Zhijie Hou (Fujitsu) | 2025-08-28 02:32:06 | RE: Conflict detection for update_deleted in logical replication |