| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Add an option to skip loading missing publication to avoid logical replication failure |
| Date: | 2025-05-02 10:44:31 |
| Message-ID: | CABPTF7XH8Uh+K-x3RMt6fOkK3xwSD2YVQehCfp_hb1TS0abe+w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Yeh, tks for your clarification. I have a basic understanding of it now. I
mean is this considered a bug or design defect in the codebase? If so,
should we prevent it from occuring in general, not just for this specific
test.
vignesh C <vignesh21(at)gmail(dot)com>
>
> We have three processes involved in this scenario:
> A walsender process on the publisher, responsible for decoding and
> sending WAL changes.
> An apply worker process on the subscriber, which applies the changes.
> A session executing the ALTER SUBSCRIPTION command.
>
> Due to the asynchronous nature of these processes, the ALTER
> SUBSCRIPTION command may not be immediately observed by the apply
> worker. Meanwhile, the walsender may process and decode an INSERT
> statement.
> If the insert targets a table (e.g., tab_3) that does not belong to
> the current publication (pub1), the walsender silently skips
> replicating the record and advances its decoding position. This
> position is sent in a keepalive message to the subscriber, and since
> there are no pending transactions to flush, the apply worker reports
> it as the latest received LSN.
> Later, when the apply worker eventually detects the subscription
> change, it restarts—but by then, the insert has already been skipped
> and is no longer eligible for replay, as the table was not part of the
> publication (pub1) at the time of decoding.
> This race condition arises because the three processes run
> independently and may progress at different speeds due to CPU
> scheduling or system load.
> Thoughts?
>
> Regards,
> Vignesh
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2025-05-02 12:04:42 | Re: fixing CREATEROLE |
| Previous Message | shveta malik | 2025-05-02 09:35:15 | Re: Fix slot synchronization with two_phase decoding enabled |