Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Date: 2025-12-10 10:02:47
Message-ID: CAJpy0uAtHmxbRVobuzkfEWG+n0OVE4y6YpTUbBVR2_y_F1ZOMQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 10, 2025 at 10:13 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 10, 2025 at 4:39 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 8, 2025 at 8:27 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Dec 9, 2025 at 12:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Dec 8, 2025 at 3:30 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, Dec 8, 2025 at 1:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Fri, Dec 5, 2025 at 4:56 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Hmm, ISTM that the root cause is that we don't synchronously update
> > > > > > > > the XLogLogicalInfo cache on the standby. A backend on the standby who
> > > > > > > > started when logical decoding is disabled keeps holding
> > > > > > > > XLogLogicalInfo = false until the promotion. So even if the startup
> > > > > > > > process enables logical decoding when applying the STATUS_CHANGE
> > > > > > > > record, logical decoding is enabled on the standby but the backend
> > > > > > > > process would start logical decoding with XLogLogicalInfo = false,
> > > > > > > > resulting in RelationIsAccessibleInLogicalDecoding() returns false. I
> > > > > > > > missed the fact that we check XLogLogicalInfoActive() even read paths.
> > > > > > > > Will fix it.
> > > > > > > >
> > > > > > >
> > > > > > > I noticed that the XLogLogicalInfoXactCache doesn't work as expected
> > > > > > > even on the primary; because the process keep using the
> > > > > > > XLogLogicalInfoXactCache in a transaction, sending
> > > > > > > PROCSIGNAL_BARRIER_UPDATE_XLOG_LOGICAL_INFO with a wait doesn't
> > > > > > > actually ensure that all processes are writing logical-WAL records
> > > > > > > before enabling logical decoding. XLogLogicalInfoXactCache could be
> > > > > > > cached even read paths without XID assignment, for example when HOT
> > > > > > > pruning happens while reading a table. So, if a process has been
> > > > > > > executing only read queries while logical decoding is disabled, it has
> > > > > > > XLogLogicalInfoXactCache=0 and can get an XID after logical decoding
> > > > > > > is enabled, writing non-logical WAL records. Such read-only
> > > > > > > transactions don't appear in xl_running_xacts so we cannot wait for
> > > > > > > them to finish during logical decoding initialization.
> > > > > > >
> > > > > > > I think there are two possible solutions to resolve this issue:
> > > > > > >
> > > > > > > 1. Revert the XLogLogicalInfoXactCache idea. While we can simplify the
> > > > > > > code, I'm concerned that it could be problematic if
> > > > > > > XLogLogicalInfoActive() could return different results whenever it's
> > > > > > > called. For instance, we should at least remove the assertion in
> > > > > > > ExecuteTruncateGuts(), but we could face a similar issue in the
> > > > > > > future.
> > > > > >
> > > > > > I'm concerned that even this idea could be problematic without waiting
> > > > > > for all running transactions to finish before enabling logical
> > > > > > decoding, depending on how we write WAL records. If
> > > > > > XLogLogicalInfoActive() could return a different value within the same
> > > > > > command, we could write a WAL record mixiting logical-information and
> > > > > > non-logical information, because some functions use
> > > > > > XLogLogicalInfoActive() several times before writing a one WAL record.
> > > > > > Further, if there is a path where a transaction decides whether to
> > > > > > include logical information based on XLogLogicalInfoActive() *before*
> > > > > > getting an XID, the transaction might not appear in xl_running_xacts
> > > > > > record, resulting in that logical decoding that started before the
> > > > > > transaction decodes such WAL records in the CONSISTENT snapbuild
> > > > > > state.
> > > > > >
> > > > >
> > > > > Are you imagining a case with following steps that could lead to a
> > > > > problem: (a) backend determine whether to include logical_info in WAL
> > > > > via XLogLogicalInfoActive() and the answer is false, (b) assign
> > > > > transaction_id, (c) got a signal to update XLogLogicalInfo to enable
> > > > > logical_info and the backend updates it, (d) backend still writes WAL
> > > > > without logical_info because it uses the information determined in
> > > > > (a), (e) logical decoding won't wait for such a transaction before
> > > > > reaching consistent state as by that time xid was not assigned, (f)
> > > > > decoding results are unpredictable because the WAL doesn't have
> > > > > logical_info?
> > > >
> > > > Exactly. In addition to such potential problems, such WAL records look
> > > > incomplete and bogus, so the validation check could fail if we have in
> > > > the future.
> > > >
> > > > IIUC a similar thing happens even if XLogLogicalInfoActive() returns a
> > > > consistent result within the same transaction, if we don't wait for
> > > > all transactions to finish;
> > > >
> > > > 1. a backend determines not to include logical-info in WAL as
> > > > XLogLogicalInfoActive() returns false.
> > > > 2. logical decoding is enabled and the global signal barrier is sent.
> > > > 3. the backend doesn't update its cache as it's within the transaction
> > > > (will be updated at the next transaction).
> > > > 4. logical decoding started and won't wait for such a transaction
> > > > before reaching a consistent state as the transaction doesn't have an
> > > > XID yet.
> > > > 5. the backend gets an XID and writes WAL records (without logical-info).
> > > > 6. logical decoding decodes such WAL records.
> > > >
> > > > As far as I checked the source code, there does not seem to be such
> > > > code paths but it might be introduced in the future.
> > > >
> > >
> > > I think if we don't have transaction-level cache, then chances are
> > > less that such a code path gets introduced even in the future.
> >
> > Why do you think the chances are less without the transaction-level
> > cache? Did you mean to say "if we have the transaction-level cache"?
> >
>
> I think the chances are higher with transaction-level cache because
> the result is cached. Consider the above 6 steps mentioned by you, in
> step-5, as we have a transaction-level cache, we won't LOG logical-wal
> info and the logical decoding process could go wrong.
>

+1. This can be reproduced as well. When the logical-decoding state is
cached, we may fail to log logical-info (unassigned XID case), causing
certain rows not to be replicated to subscribers. The steps below
demonstrate this.

Backend1 of pub:
-------------
create table tab1(i int);
create publication pub1 for table tab1;

BEGIN;
SELECT txid_current_if_assigned(); --xid not assigned yet.
SHOW wal_level; SHOW effective_wal_level; --replica

--pause here and do 'Step1' mentioned below on backend2.
--logical decoding is now enabled except this backend.
--now continue with backend1:

insert into tab1 values(20);
insert into tab1 values(30);

--pause here and do 'Step2' mentioned below on backend2.
--now continue with backend1:

SELECT txid_current_if_assigned(); --xid gets assigned before above insert.
SHOW wal_level; SHOW effective_wal_level; --it is still 'replica' in this txn.
COMMIT;

Step1 (it will enable logical decoding):
----------------------------
Backend2 of pub:
SELECT pg_create_logical_replication_slot('slot', 'pgoutput', false,
false, false);
show wal_level; show effective_wal_level; --logical now.

Subscriber:
create table tab1(i int);
create subscription sub1 connection '...' publication pub1;

Backend2 of pub: insert into tab1 values(10);
----------------------------

Step2:
--------------------------------
Backend2 of pub: insert into tab1 values(40);
--------------------------------

At the end after backend1 commits:
On pub, we have 4 rows in tab1:
{10}, {20}, {30}, {40}

On sub, we have 2 rows in tab1:
{10}, {40}

~~

If we stop caching the logical-decoding state within a transaction, we
may still encounter issues, because the backend could observe logical
decoding as disabled at one point and enabled at another. But the
current caching behavior seems to expose the problem more directly.
Does that mean that avoiding caching might actually be slightly safer?
Thoughts?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-12-10 10:07:35 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Andrey Borodin 2025-12-10 09:53:02 Re: PoC: Simplify recovery after dropping a table by LOGGING the restore LSN