Quick Links

RE: Conflict detection for update_deleted in logical replication

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	RE: Conflict detection for update_deleted in logical replication
Date:	2025-08-27 03:29:51
Message-ID:	TY4PR01MB169079F275EE027F29E2D24889438A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tuesday, August 26, 2025 2:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Aug 25, 2025 at 5:05 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > A few comments on 0001:
> >
>
> Some more comments:

Thanks for the comments!

> 1.
> + /*
> + * Return false if the leader apply worker has stopped retaining
> + * information for detecting conflicts. This implies that
> + update_deleted
> + * can no longer be reliably detected.
> + */
> + if (!retention_active)
> + return false;
> +
> /*
> * For conflict detection, we use the conflict slot's xmin value instead
> * of invoking GetOldestNonRemovableTransactionId(). The slot.xmin acts as
> @@ -3254,7 +3315,15 @@ FindDeletedTupleInLocalRel(Relation localrel, Oid
> localidxoid,
> oldestxmin = slot->data.xmin;
> SpinLockRelease(&slot->mutex);
>
> - Assert(TransactionIdIsValid(oldestxmin));
> + /*
> + * Return false if the conflict detection slot.xmin is set to
> + * InvalidTransactionId. This situation arises if the current worker is
> + * either a table synchronization or parallel apply worker, and the
> + leader
> + * stopped retention immediately after checking the
> + * oldest_nonremovable_xid above.
> + */
> + if (!TransactionIdIsValid(oldestxmin))
> + return false;
>
> If the current worker is tablesync or parallel_apply, it should have exited from
> the above check of retention_active as we get the leader's
> oldest_nonremovable_xid to decide that. What am, I missing? This made me
> wonder whether we need to use slot's xmin after we have fetched leader's
> oldest_nonremovable_xid to find deleted tuple?

There was a race condition that the leader could stop retention immediately
after the pa worker fetches its oldest_nonremovable_xid. But I agree that it
would be simpler to directly use oldest_nonremovable_xid to find deleted tuple,
so changed. This way, the logic can be simplified.

>
> 2.
> - * The interval is reset to a minimum value of 100ms once there is some
> - * activity on the node.
> + * The interval is reset to the lesser of 100ms and
> + * max_conflict_retention_duration once there is some activities on the node.
> AFAICS, this is not adhered in the code because you are using it when there is
> no activity aka when new_xid_found is false. IS the comment wrong or code
> needs some updation?

I think both needs to be updated. I have adjusted the code to consider
max_retention in both cases and update the comments.

All the comments have been addressed in V66 patch set.

Best Regards,
Hou zj

In response to

Re: Conflict detection for update_deleted in logical replication at 2025-08-26 06:44:56 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhijie Hou (Fujitsu)	2025-08-27 03:29:54	RE: Conflict detection for update_deleted in logical replication
Previous Message	Zhijie Hou (Fujitsu)	2025-08-27 03:29:12	RE: Conflict detection for update_deleted in logical replication