Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-12-10 04:43:21
Message-ID: CAA4eK1JUDhEpY9i64NA1hN65XLuZsnuNpW81O0V=K6zcdUoK=w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 10, 2025 at 4:39 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Dec 8, 2025 at 8:27 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 9, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Dec 8, 2025 at 3:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 1:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Dec 5, 2025 at 4:56 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > >
> > > > > > > Hmm, ISTM that the root cause is that we don't synchronously update
> > > > > > > the XLogLogicalInfo cache on the standby. A backend on the standby who
> > > > > > > started when logical decoding is disabled keeps holding
> > > > > > > XLogLogicalInfo = false until the promotion. So even if the startup
> > > > > > > process enables logical decoding when applying the STATUS_CHANGE
> > > > > > > record, logical decoding is enabled on the standby but the backend
> > > > > > > process would start logical decoding with XLogLogicalInfo = false,
> > > > > > > resulting in RelationIsAccessibleInLogicalDecoding() returns false. I
> > > > > > > missed the fact that we check XLogLogicalInfoActive() even read paths.
> > > > > > > Will fix it.
> > > > > > >
> > > > > >
> > > > > > I noticed that the XLogLogicalInfoXactCache doesn't work as expected
> > > > > > even on the primary; because the process keep using the
> > > > > > XLogLogicalInfoXactCache in a transaction, sending
> > > > > > PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO with a wait doesn't
> > > > > > actually ensure that all processes are writing logical-WAL records
> > > > > > before enabling logical decoding. XLogLogicalInfoXactCache could be
> > > > > > cached even read paths without XID assignment, for example when HOT
> > > > > > pruning happens while reading a table. So, if a process has been
> > > > > > executing only read queries while logical decoding is disabled, it has
> > > > > > XLogLogicalInfoXactCache=0 and can get an XID after logical decoding
> > > > > > is enabled, writing non-logical WAL records. Such read-only
> > > > > > transactions don't appear in xl_running_xacts so we cannot wait for
> > > > > > them to finish during logical decoding initialization.
> > > > > >
> > > > > > I think there are two possible solutions to resolve this issue:
> > > > > >
> > > > > > 1. Revert the XLogLogicalInfoXactCache idea. While we can simplify the
> > > > > > code, I'm concerned that it could be problematic if
> > > > > > XLogLogicalInfoActive() could return different results whenever it's
> > > > > > called. For instance, we should at least remove the assertion in
> > > > > > ExecuteTruncateGuts(), but we could face a similar issue in the
> > > > > > future.
> > > > >
> > > > > I'm concerned that even this idea could be problematic without waiting
> > > > > for all running transactions to finish before enabling logical
> > > > > decoding, depending on how we write WAL records. If
> > > > > XLogLogicalInfoActive() could return a different value within the same
> > > > > command, we could write a WAL record mixiting logical-information and
> > > > > non-logical information, because some functions use
> > > > > XLogLogicalInfoActive() several times before writing a one WAL record.
> > > > > Further, if there is a path where a transaction decides whether to
> > > > > include logical information based on XLogLogicalInfoActive() *before*
> > > > > getting an XID, the transaction might not appear in xl_running_xacts
> > > > > record, resulting in that logical decoding that started before the
> > > > > transaction decodes such WAL records in the CONSISTENT snapbuild
> > > > > state.
> > > > >
> > > >
> > > > Are you imagining a case with following steps that could lead to a
> > > > problem: (a) backend determine whether to include logical_info in WAL
> > > > via XLogLogicalInfoActive() and the answer is false, (b) assign
> > > > transaction_id, (c) got a signal to update XLogLogicalInfo to enable
> > > > logical_info and the backend updates it, (d) backend still writes WAL
> > > > without logical_info because it uses the information determined in
> > > > (a), (e) logical decoding won't wait for such a transaction before
> > > > reaching consistent state as by that time xid was not assigned, (f)
> > > > decoding results are unpredictable because the WAL doesn't have
> > > > logical_info?
> > >
> > > Exactly. In addition to such potential problems, such WAL records look
> > > incomplete and bogus, so the validation check could fail if we have in
> > > the future.
> > >
> > > IIUC a similar thing happens even if XLogLogicalInfoActive() returns a
> > > consistent result within the same transaction, if we don't wait for
> > > all transactions to finish;
> > >
> > > 1. a backend determines not to include logical-info in WAL as
> > > XLogLogicalInfoActive() returns false.
> > > 2. logical decoding is enabled and the global signal barrier is sent.
> > > 3. the backend doesn't update its cache as it's within the transaction
> > > (will be updated at the next transaction).
> > > 4. logical decoding started and won't wait for such a transaction
> > > before reaching a consistent state as the transaction doesn't have an
> > > XID yet.
> > > 5. the backend gets an XID and writes WAL records (without logical-info).
> > > 6. logical decoding decodes such WAL records.
> > >
> > > As far as I checked the source code, there does not seem to be such
> > > code paths but it might be introduced in the future.
> > >
> >
> > I think if we don't have transaction-level cache, then chances are
> > less that such a code path gets introduced even in the future.
>
> Why do you think the chances are less without the transaction-level
> cache? Did you mean to say "if we have the transaction-level cache"?
>

I think the chances are higher with transaction-level cache because
the result is cached. Consider the above 6 steps mentioned by you, in
step-5, as we have a transaction-level cache, we won't LOG logical-wal
info and the logical decoding process could go wrong.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message cca5507 2025-12-10 05:01:18 Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding
Previous Message Chao Li 2025-12-10 04:26:39 Re: Improve pg_sync_replication_slots() to wait for primary to advance