Quick Links

RE: Conflict detection for update_deleted in logical replication

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	RE: Conflict detection for update_deleted in logical replication
Date:	2025-07-18 12:02:50
Message-ID:	OS0PR01MB5716688F74F6121B8CE797119450A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Friday, July 18, 2025 1:25 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Jul 11, 2025 at 3:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 10, 2025 at 6:46 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jul 9, 2025 at 9:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > >
> > > > > I think that even with retain_conflict_info = off, there is
> > > > > probably a point at which the subscriber can no longer keep up
> > > > > with the publisher. For example, if with retain_conflict_info =
> > > > > off we can withstand 100 clients running at the same time, then
> > > > > the fact that this performance degradation occurred with 15
> > > > > clients explains that performance degradation is much more
> > > > > likely to occur because of retain_conflict_info = on.
> > > > >
> > > > > Test cases 3 and 4 are typical cases where this feature is used
> > > > > since the conflicts actually happen on the subscriber, so I
> > > > > think it's important to look at the performance in these cases.
> > > > > The worst case scenario for this feature is that when this
> > > > > feature is turned on, the subscriber cannot keep up even with a
> > > > > small load, and with max_conflict_retetion_duration we enter a
> > > > > loop of slot invalidation and re-creating, which means that
> > > > > conflict cannot be detected reliably.
> > > > >
> > > >
> > > > As per the above observations, it is less of a regression of this
> > > > feature but more of a lack of parallel apply or some kind of
> > > > pre-fetch for apply, as is recently proposed [1]. I feel there are
> > > > use cases, as explained above, for which this feature would work
> > > > without any downside, but due to a lack of some sort of parallel
> > > > apply, we may not be able to use it without any downside for cases
> > > > where the contention is only on a smaller set of tables. We have
> > > > not tried, but may in cases where contention is on a smaller set
> > > > of tables, if users distribute workload among different pub-sub
> > > > pairs by using row filters, there also, we may also see less
> > > > regression. We can try that as well.
> > >
> > > While I understand that there are some possible solutions we have
> > > today to reduce the contention, I'm not really sure these are really
> > > practical solutions as it increases the operational costs instead.
> > >
> >
> > I assume by operational costs you mean defining the replication
> > definitions such that workload is distributed among multiple apply
> > workers via subscriptions either by row_filters, or by defining
> > separate pub-sub pairs of a set of tables, right? If so, I agree with
> > you but I can't think of a better alternative. Even without this
> > feature as well, we know in such cases the replication lag could be
> > large as is evident in recent thread [1] and some offlist feedback by
> > people using native logical replication. As per a POC in the
> > thread[1], parallelizing apply or by using some prefetch, we could
> > reduce the lag but we need to wait for that work to mature to see the
> > actual effect of it.
>
> I don't have a better alternative either.
>
> I agree that this feature will work without any problem when logical replication
> is properly configured. It's a good point that update-delete conflicts can be
> detected reliably without additional performance overhead in scenarios with
> minimal replication lag.
> However, this approach requires users to carefully pay particular attention to
> replication performance and potential delays. My primary concern is that, given
> the current logical replication performance limitations, most users who want to
> use this feature will likely need such dedicated care for replication lag.
> Nevertheless, most features involve certain trade-offs. Given that this is an
> opt-in feature and future performance improvements will reduce these
> challenges for users, it would be reasonable to have this feature at this stage.
>
> >
> > The path I see with this work is to clearly document the cases
> > (configuration) where this feature could be used without much downside
> > and keep the default value of subscription option to enable this as
> > false (which is already the case with the patch).
>
> +1

Thanks for the discussion. Here is the V49 patch which includes the suggested
doc change in 0002. I will rebase the remaining patches once the first one is
pushed.

Thanks to Shveta for preparing the doc change.

Best Regards,
Hou zj

Attachment	Content-Type	Size
v49-0001-Preserve-conflict-relevant-data-during-logical-r.patch	application/octet-stream	185.3 KB
v49-0002-refactor-launcher-slot-creation-and-doc-perf.patch	application/octet-stream	8.8 KB

In response to

Re: Conflict detection for update_deleted in logical replication at 2025-07-17 17:25:27 from Masahiko Sawada

Responses

Re: Conflict detection for update_deleted in logical replication at 2025-07-18 21:30:43 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2025-07-18 12:17:17	Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9
Previous Message	Álvaro Herrera	2025-07-18 11:53:45	Re: IPC/MultixactCreation on the Standby server