RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-09-01 11:37:30
Message-ID: TY4PR01MB169074D21BF4868F7FB2730159407A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, September 1, 2025 12:45 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, August 29, 2025 6:28 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>:
> >
> > On Fri, Aug 29, 2025 at 11:49 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com>
> > wrote:
> > >
> > > Here is the new version patch set which also addressed Shveta's
> > comments[1].
> > >
> >
> > Thanks for the patch.
> >
> > On 001 alone, I’m observing a behavior where, if sub1 has stopped
> > retention, and I then create a new subscription sub2, the worker for
> > sub2 fails to start successfully. It repeatedly starts and exits,
> > logging the following message:
> >
> > LOG: logical replication worker for subscription "sub2" will restart
> > because the option retain_dead_tuples was enabled during startup
> >
> > Same things happen when I disable and re-enable 'retain_dead_tuple' of
> > any sub once the slot has invalid xmin.
>
> I think this behavior is because slot.xmin is set to an invalid number, and 0001
> patch has no slot recovery logic, so even if retentionactive is true, newly created
> subscriptions cannot have a valid oldest_nonremovable_xid.
>
> After thinking more, I decided to add slot recovery functionality to 0001 as well,
> thus avoiding the need for additional checks here. I also adjusted the
> documents accordingly.
>
> Here is the V69 patch set which addressed above comments and the latest
> comment from Nisha[1].

I reviewed the patch internally and tweaked a small detail of the apply worker
to reduce the waiting time in the main loop when max_retention_duration is
defined (set wait_time = min(wait_time, max_retention_duration)). Also, I added
a simple test in 035_conflicts.pl of 0001 to verify the new sub option.

Here is V70 patch set.

Best Regards,
Hou zj

Attachment Content-Type Size
v70-0003-Add-a-dead_tuple_retention_active-column-in-pg_s.patch application/octet-stream 8.1 KB
v70-0001-Add-max_retention_duration-option-to-subscriptio.patch application/octet-stream 107.2 KB
v70-0002-Resume-retaining-the-information-for-conflict-de.patch application/octet-stream 18.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-09-01 12:11:06 Re: Changing the state of data checksums in a running cluster
Previous Message Xuneng Zhou 2025-09-01 11:31:29 Re: Add progressive backoff to XactLockTableWait functions