| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Date: | 2025-12-09 04:27:36 |
| Message-ID: | CAA4eK1Jb396AY6pRNXSSWabJuXCSnPA-=HYNQkfoK9FPf+q_wQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Dec 9, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 8, 2025 at 3:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 8, 2025 at 1:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Dec 5, 2025 at 4:56 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > >
> > > > > Hmm, ISTM that the root cause is that we don't synchronously update
> > > > > the XLogLogicalInfo cache on the standby. A backend on the standby who
> > > > > started when logical decoding is disabled keeps holding
> > > > > XLogLogicalInfo = false until the promotion. So even if the startup
> > > > > process enables logical decoding when applying the STATUS_CHANGE
> > > > > record, logical decoding is enabled on the standby but the backend
> > > > > process would start logical decoding with XLogLogicalInfo = false,
> > > > > resulting in RelationIsAccessibleInLogicalDecoding() returns false. I
> > > > > missed the fact that we check XLogLogicalInfoActive() even read paths.
> > > > > Will fix it.
> > > > >
> > > >
> > > > I noticed that the XLogLogicalInfoXactCache doesn't work as expected
> > > > even on the primary; because the process keep using the
> > > > XLogLogicalInfoXactCache in a transaction, sending
> > > > PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO with a wait doesn't
> > > > actually ensure that all processes are writing logical-WAL records
> > > > before enabling logical decoding. XLogLogicalInfoXactCache could be
> > > > cached even read paths without XID assignment, for example when HOT
> > > > pruning happens while reading a table. So, if a process has been
> > > > executing only read queries while logical decoding is disabled, it has
> > > > XLogLogicalInfoXactCache=0 and can get an XID after logical decoding
> > > > is enabled, writing non-logical WAL records. Such read-only
> > > > transactions don't appear in xl_running_xacts so we cannot wait for
> > > > them to finish during logical decoding initialization.
> > > >
> > > > I think there are two possible solutions to resolve this issue:
> > > >
> > > > 1. Revert the XLogLogicalInfoXactCache idea. While we can simplify the
> > > > code, I'm concerned that it could be problematic if
> > > > XLogLogicalInfoActive() could return different results whenever it's
> > > > called. For instance, we should at least remove the assertion in
> > > > ExecuteTruncateGuts(), but we could face a similar issue in the
> > > > future.
> > >
> > > I'm concerned that even this idea could be problematic without waiting
> > > for all running transactions to finish before enabling logical
> > > decoding, depending on how we write WAL records. If
> > > XLogLogicalInfoActive() could return a different value within the same
> > > command, we could write a WAL record mixiting logical-information and
> > > non-logical information, because some functions use
> > > XLogLogicalInfoActive() several times before writing a one WAL record.
> > > Further, if there is a path where a transaction decides whether to
> > > include logical information based on XLogLogicalInfoActive() *before*
> > > getting an XID, the transaction might not appear in xl_running_xacts
> > > record, resulting in that logical decoding that started before the
> > > transaction decodes such WAL records in the CONSISTENT snapbuild
> > > state.
> > >
> >
> > Are you imagining a case with following steps that could lead to a
> > problem: (a) backend determine whether to include logical_info in WAL
> > via XLogLogicalInfoActive() and the answer is false, (b) assign
> > transaction_id, (c) got a signal to update XLogLogicalInfo to enable
> > logical_info and the backend updates it, (d) backend still writes WAL
> > without logical_info because it uses the information determined in
> > (a), (e) logical decoding won't wait for such a transaction before
> > reaching consistent state as by that time xid was not assigned, (f)
> > decoding results are unpredictable because the WAL doesn't have
> > logical_info?
>
> Exactly. In addition to such potential problems, such WAL records look
> incomplete and bogus, so the validation check could fail if we have in
> the future.
>
> IIUC a similar thing happens even if XLogLogicalInfoActive() returns a
> consistent result within the same transaction, if we don't wait for
> all transactions to finish;
>
> 1. a backend determines not to include logical-info in WAL as
> XLogLogicalInfoActive() returns false.
> 2. logical decoding is enabled and the global signal barrier is sent.
> 3. the backend doesn't update its cache as it's within the transaction
> (will be updated at the next transaction).
> 4. logical decoding started and won't wait for such a transaction
> before reaching a consistent state as the transaction doesn't have an
> XID yet.
> 5. the backend gets an XID and writes WAL records (without logical-info).
> 6. logical decoding decodes such WAL records.
>
> As far as I checked the source code, there does not seem to be such
> code paths but it might be introduced in the future.
>
I think if we don't have transaction-level cache, then chances are
less that such a code path gets introduced even in the future. I think
it is not a good idea to rely on the old value of
XLogLogicalInfoActive(), especially after this patch, when its return
value can change dynamically. We can document/comment such a
precaution, and if in the future there really is a need or argument to
allow such cases, then we can introduce wait during slot_creation.
Having said that, I see your point about introducing a wait having
merits w.r.t. making the code more future-proof, but it is also
possible that such a need never arises, and we unnecessarily let users
pay the price by waiting for all concurrent transactions to finish.
For example, won't we need to consider the possibility of walsender
timeout because of this wait, say for tablesync case where we create
the logical slot? Or similarly connection_timeout? And even if we are
okay based on the theory that the same can happen when we wait for the
snapshot to reach a consistent state during decoding, we need some
logic as we have in SnapBuildWaitSnapshot().
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2025-12-09 04:42:11 | Re: Proposal: Conflict log history table for Logical Replication |
| Previous Message | Michael Paquier | 2025-12-09 04:16:32 | Re: BUG #19095: Test if function exit() is used fail when linked static |