RE: Newly created replication slot may be invalidated by checkpoint

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>
Cc: suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Date: 2025-12-09 02:06:42
Message-ID: TY4PR01MB169071C25B2288ECD746AA65094A3A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, December 9, 2025 12:40 AM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> On Monday, December 08, 2025 13:24 MSK, "Zhijie Hou (Fujitsu)"
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> > On Monday, December 8, 2025 5:47 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > Sawada-san/Vitaly, do you have any opinion on patch or the
> > > > > direction to fix? The idea is to get this fixed for HEAD and 18,
> > > > > then continue discussion for other bank-branches and the remaining
> patches.
>
> Hi Amit, Zhijie Hou
>
> Thank you for preparing and comiting 0001 patch. I'm ok with it. I did some
> auto testing of the patch and haven't found any problems. As I realized,
> another two patches (0002, 0003) are still in review.

Thanks for testing!

>
> In my previous email I wrote about copy_replication_slot, where restart_lsn is
> assigned without any locks, but I'm not sure that email was successfully
> delivered. Masahiko Sagawa mentioned about it in one of the latest emails as
> well. I also read the answer but not completely understood it at the moment,
> sorry (need some more time to investigate). Anyway, I would prefer to use
> locks in create_physical_replication_slot rather than rely on signals handling
> which may be changed in the future.

If we want to improve that, taking lock when updating restart_lsn does not work,
because the initial restart_lsn is an old position copied from another slot,
where the WALs could have already been removed, so unlike the mechanism
mentioned in 006dd4b2, the lock cannot ensure the same. I think we might need
to some other solutions for it which can be discussed separately.

>
> One more thing, when we copy a logical replication slot,
> DecodingContextFindStartpoint reads the WAL from the specified restart_lsn
> which may be removed by a concurrent checkpoint. It can produce an error
> and stop slot copying, I guess. This behaviour may be not desirable.

Given that we don't search for a starting point for copied slots (we
pass find_startpoint=false when copying), I don't think this issue exists.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-12-09 03:38:53 Re: [Proposal] Adding callback support for custom statistics kinds
Previous Message Tom Lane 2025-12-09 01:30:55 Re: Let's add a test for NLS translation of PRI* macros