RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-09-10 03:38:47
Message-ID: TY4PR01MB169075232AE3A6E643F0EB749940EA@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, September 9, 2025 7:01 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Sep 9, 2025 at 11:47 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > Here is V71 patch set which addressed above comments and [1].
> >
>
> IIUC, this patch after stopping the retention, it immediately starts retrying to
> resume by transitioning through various phases. This can consume CPU and
> network resources, if the apply_worker takes a long time to catch up.

I agree. I think one way is to increase the interval in each cycle when the retention has
been stopped and the worker is retrying to resume the retention. I have
updated the patch for the same.

>
> Few other comments:
> 1.
> + /*
> + * Return if the launcher has not initialized oldest_nonremovable_xid.
> + *
> + * It might seem feasible to directly check the conflict detection
> + * slot.xmin instead of relying on the launcher to assign the worker's
> + * oldest_nonremovable_xid; however, that could lead to a race
> + condition
> + * where slot.xmin is set to InvalidTransactionId immediately after the
> + * check. In such cases, oldest_nonremovable_xid would no longer be
> + * protected by a replication slot and could become unreliable if a
> + * wraparound occurs.
> + */
> + if (!TransactionIdIsValid(nonremovable_xid))
> + return;
>
> I understand the reason why we get assigned the worker's non_removable_xid
> from launcher but all this makes the code complex to understand. Isn't there
> any other way to achieve it? One naïve and inefficient way could be to just
> restart the worker probably after updating its retentionactive flag. I am not
> suggesting to make this change but just a brainstorming point.

I'll keep thinking about it, and for now, I've added a comment mentioning
that rebooting is a simpler solution.

>
> 2. The function should_stop_conflict_info_retention() is invoked from
> wait_for_local_flush() and then it can lead further state transitioning which
> doesn't appear neat and makes code difficult to understand.

I changed the logic to avoid proceeding to next phase when
the retention is stopped.

>
> 3.
> + /*
> + * If conflict info retention was previously stopped due to a timeout,
> + and
> + * the time required to advance the non-removable transaction ID has
> + now
> + * decreased to within acceptable limits, resume the rentention.
> + */
> + if (!MySubscription->retentionactive)
> + {
> + rdt_data->phase = RDT_RESUME_CONFLICT_INFO_RETENTION;
> + process_rdt_phase_transition(rdt_data, false); return; }
>
> In this check, where do we check the time has come in acceptable range? Can
> you update comments to make it clear?

I updated the comments to mention that the code can reach here only
when the time is within max_retention_duration.

Here is the V72 patch set which addressed above and Shveta's comments[1].

[1] https://www.postgresql.org/message-id/CAJpy0uDw7SmCN_jOvpNUzo_sE4ZsgpqQ5_vHLjm4aCm10eBApA%40mail.gmail.com

Best Regards,
Hou zj

Attachment Content-Type Size
v72-0001-Allow-conflict-relevant-data-retention-to-resume.patch application/octet-stream 22.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2025-09-10 03:39:00 RE: Conflict detection for update_deleted in logical replication
Previous Message Peter Smith 2025-09-10 03:38:38 Re: PostgreSQL 18 GA press release draft