| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Subject: | RE: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2025-12-08 10:24:46 |
| Message-ID: | TY4PR01MB169079865E5A86A975547679294A2A@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Monday, December 8, 2025 5:47 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > Here are the updated patches for HEAD and 18. I did not add tests
> > > > since, after applying the patch and resolving the issue, the only
> > > > observable behavior is that the checkpoint will wait for another
> > > > backend to create a slot due to the lwlock lock, so it seems not
> > > > worth to test solely lwlock wait event (I could not find similar tests).
> > > >
> > >
> > > Fair enough. The patch looks mostly good to me, attached are minor
> > > comment improvements atop the HEAD patch. I'll do some more testing
> > > before push.
> > >
> > > Sawada-san/Vitaly, do you have any opinion on patch or the direction
> > > to fix? The idea is to get this fixed for HEAD and 18, then continue
> > > discussion for other bank-branches and the remaining patches.
> >
> > +1
> >
>
> Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to 18
> and other problems reported in this thread.
Thanks for pushing. I thought about whether it's possible to apply a similar fix
to back-branches and one approach could be to take ReplicationSlotAllocationLock
at two places. E.g., acquire an exclusive lock WAL reservation, and a shared
lock during the minimum LSN calculation at checkpoints to serialize the process.
The logic is similar to HEAD: it ensures that, if WAL reservation
occurs first, the checkpoint waits until restart_lsn is updated before
calculating the minimum LSN. If the checkpoint runs first, subsequent WAL
reservations pick a position at or after the latest checkpoint's redo pointer.
Here is the patch based on PG17 for reference.
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| v9_PG17-0001-Prevent-invalidation-of-newly-created-replic.patch | application/octet-stream | 7.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Geier | 2025-12-08 10:25:28 | Re: get rid of Pointer type, mostly |
| Previous Message | Heikki Linnakangas | 2025-12-08 10:12:27 | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |