| From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, suyu(dot)cmj <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, bharath(dot)rupireddyforpostgres <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | RE: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2025-11-21 06:14:38 |
| Message-ID: | TY4PR01MB1690710D18F1B7C825CBC5D7494D5A@TY4PR01MB16907.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thursday, November 20, 2025 6:44 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 20, 2025 at 4:07 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > On Thursday, November 20, 2025 4:26 PM Vitaly Davydov
> <v(dot)davydov(at)postgrespro(dot)ru> wrote:
> >
> > >
> > > Concerning reserve_wal_for_local_slot. It seems it is used in
> > > synchronization of failover logical slots. For me, it is tricky to
> > > change restart_lsn of a synced logical slot to RedoRecPtr, because
> > > it may lead to problems with logical replication using such slot
> > > after the replica promotion. But it seems it is the architectural
> > > problem and it is not related to the problems, solved by the patch.
> >
> > I think this is not an issue because if we use the redo pointer
> > instead of the remote restart_lsn as the initial value, the synced
> > slot won't be marked as sync-ready, so user cannot use it after
> > promotion (also well documented). This is also the existing behavior
> > before the patch, e.g., if the required WALs were removed, the oldest
> > available WAL was used as the initial value, similarly resulting in the slot not
> being sync-ready.
> >
>
> Would it be better to discuss this in a separate thread? Though this is related
> to original problem but still in a separate part of code
> (slotsync) which I think can have a separate fix especially when the fix is also
> somewhat different.
>
> > >
> > > The change of lock mode to EXCLUSIVE in
> > > ReplicationSlotsComputeRequiredLSN may affect the performance when
> a
> > > lot of slots are advanced during some small period of time. It may
> > > affect the walsender performance. It advances the logical or
> > > physical slots when receive a confirmation from the replica. I
> > > guess, the slot advancement may be pretty frequent operation.
> >
> > Yes, I had the same thought and considered a simple alternative
> > (similar to your suggestion below): use an exclusive lock only when
> > updating the slot.restart_lsn during WAL reservation, while continuing
> > to use a shared lock in the computation function. Additionally, place
> XLogSetReplicationSlotMinimumLSN() under the lock.
> > This approach will also help serialize the process.
> >
>
> Can we discuss this as well in a separate thread?
OK, I think it makes sense to start separate threads.
I have split the patches based on the different bugs they
address and am sharing them here for reference.
Best Regards,
Hou zj
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0003-Fix-race-conditions-causing-invalidation-of-newly.patch | application/octet-stream | 7.8 KB |
| v6-0001-Fix-the-race-condition-between-slot-wal-reservati.patch | application/octet-stream | 6.0 KB |
| v6-0002-Fix-the-race-condition-of-updating-slot-minimum-L.patch | application/octet-stream | 3.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2025-11-21 06:26:25 | Re: CREATE/ALTER PUBLICATION improvements for syntax synopsis |
| Previous Message | Amit Kapila | 2025-11-21 06:00:25 | Re: How can end users know the cause of LR slot sync delays? |