Quick Links

Re: Add an option to skip loading missing publication to avoid logical replication failure

From:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Add an option to skip loading missing publication to avoid logical replication failure
Date:	2025-05-02 10:44:31
Message-ID:	CABPTF7XH8Uh+K-x3RMt6fOkK3xwSD2YVQehCfp_hb1TS0abe+w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Yeh, tks for your clarification. I have a basic understanding of it now. I
mean is this considered a bug or design defect in the codebase? If so,
should we prevent it from occuring in general, not just for this specific
test.

vignesh C <vignesh21(at)gmail(dot)com>

>
> We have three processes involved in this scenario:
> A walsender process on the publisher, responsible for decoding and
> sending WAL changes.
> An apply worker process on the subscriber, which applies the changes.
> A session executing the ALTER SUBSCRIPTION command.
>
> Due to the asynchronous nature of these processes, the ALTER
> SUBSCRIPTION command may not be immediately observed by the apply
> worker. Meanwhile, the walsender may process and decode an INSERT
> statement.
> If the insert targets a table (e.g., tab_3) that does not belong to
> the current publication (pub1), the walsender silently skips
> replicating the record and advances its decoding position. This
> position is sent in a keepalive message to the subscriber, and since
> there are no pending transactions to flush, the apply worker reports
> it as the latest received LSN.
> Later, when the apply worker eventually detects the subscription
> change, it restarts—but by then, the insert has already been skipped
> and is no longer eligible for replay, as the table was not part of the
> publication (pub1) at the time of decoding.
> This race condition arises because the three processes run
> independently and may progress at different speeds due to CPU
> scheduling or system load.
> Thoughts?
>
> Regards,
> Vignesh
>

In response to

Re: Add an option to skip loading missing publication to avoid logical replication failure at 2025-05-02 05:22:43 from vignesh C

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2025-05-02 12:04:42	Re: fixing CREATEROLE
Previous Message	shveta malik	2025-05-02 09:35:15	Re: Fix slot synchronization with two_phase decoding enabled