Re: Conflict detection for update_deleted in logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2025-09-11 10:53:58
Message-ID: CAFiTN-uXki5nX+1RpFvo++dT1NSrEJhPxGtJAqQRS89vuPX7pA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 10, 2025 at 2:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Sep 10, 2025 at 9:08 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tuesday, September 9, 2025 7:01 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> > > wrote:
> > >
> > > Few other comments:
> > > 1.
> > > + /*
> > > + * Return if the launcher has not initialized oldest_nonremovable_xid.
> > > + *
> > > + * It might seem feasible to directly check the conflict detection
> > > + * slot.xmin instead of relying on the launcher to assign the worker's
> > > + * oldest_nonremovable_xid; however, that could lead to a race
> > > + condition
> > > + * where slot.xmin is set to InvalidTransactionId immediately after the
> > > + * check. In such cases, oldest_nonremovable_xid would no longer be
> > > + * protected by a replication slot and could become unreliable if a
> > > + * wraparound occurs.
> > > + */
> > > + if (!TransactionIdIsValid(nonremovable_xid))
> > > + return;
> > >
> > > I understand the reason why we get assigned the worker's non_removable_xid
> > > from launcher but all this makes the code complex to understand. Isn't there
> > > any other way to achieve it? One naïve and inefficient way could be to just
> > > restart the worker probably after updating its retentionactive flag. I am not
> > > suggesting to make this change but just a brainstorming point.
> >
> > I'll keep thinking about it, and for now, I've added a comment mentioning
> > that rebooting is a simpler solution.
> >
>
> Say we have a LW LogicalRepRetentionLock. We acquire it in SHARED mode
> as soon as we encounter the first subscription with retain_dead_tuples
> set during the traversal of the sublist. We release it only after
> updating xmin outside the sublist traversal. We then acquire it in
> EXCLUSIVE mode to fetch the resume the retention in worker for the
> period till we fetch slot's xmin.
>
> This will close the above race condition but acquiring LWLock while
> traversing subscription is not advisable as that will make it
> uninterruptible. The other possibility is to use some heavy-weight
> lock here, say a lock on pg_subscription catalog but that has a
> drawback that it can conflict with DDL operations. Yet another way is
> to invent a new lock-type for this.
>
> OTOH, the simple strategy to let apply-worker restart to resume
> retention will keep the handling simpler. We do something similar at
> the start of apply-worker if we find that some subscription parameter
> is changed. I think we need more opinions on this matter.

IMHO the situation of retention being disabled and re-enabled is not a
common occurrence. It typically happens in specific scenarios where
there's a significant replication lag or the user has not configured
the retention timeout correctly. Because these are corner cases, I
believe we should avoid over-engineering a solution and simply restart
the worker, as Amit suggested.

--
Regards,
Dilip Kumar
Google

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Burd 2025-09-11 10:56:07 Re: [PATCH] Add tests for Bitmapset
Previous Message Vamshikrishna T 2025-09-11 10:48:57 Re: AIX support