Re: Disabled logical replication origin session causes primary key errors

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Shawn McCoy <shawn(dot)the(dot)mccoy(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, drewwcallahan(at)gmail(dot)com, "scott(at)meads(dot)us" <scott(at)meads(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Disabled logical replication origin session causes primary key errors
Date: 2025-04-20 05:53:26
Message-ID: CAD21AoA1K8pD-90B-KW_8ifRRQU7_nOBZTo3We6ENpu+uCSdUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

Thank you for the report and sharing the reproducers.

On Fri, Apr 18, 2025 at 2:22 PM Shawn McCoy <shawn(dot)the(dot)mccoy(at)gmail(dot)com> wrote:
>
> Hello,
>
> We have discovered a recent regression in the Origin handling of logical replication apply workers. We have found the cause of the issue was due to the worker resetting its local origin session information during the processing of an error that is
> silently handled allowing the worker to continue. We suspect this is caused by the recent change made in the following thread, https://www.postgresql.org/message-id/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com.
>
> The logical replication apply worker will originally setup the origin correctly. However, on the first insert will call into the trigger which will raise an exception. This exception will execute the error callback that resets the origin session state. The exception will then be silently handled, returning execution back to the apply worker. In the second reproduction, a function based index is used with the same result.
>
> At this point, the apply worker can continue to commit these changes, but has cleared all local origin session state. As a result, we will not update our remote to local LSN mapping of the origin. Allowing for duplicate data to be applied.

I agree with your analysis. When the subscriber restarts the logical
replication, changes that happened since the last acknowledged LSN
would be replicated again.

With commit 3f28b2fcac33f, we reset the replication origin in
apply_error_callback() but I guess moving it to the PG_CATCH() block
in start_apply() might work.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2025-04-20 10:21:25 Re: Error when using array_agg with filter where clause in pg16 and pg17
Previous Message Tom Lane 2025-04-19 19:33:20 Re: BUG #18896: A potential problem in heap_page_items (pageinspect, PG-17)