| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "suyu(dot)cmj" <mengjuan(dot)cmj(at)alibaba-inc(dot)com>, tomas <tomas(at)vondra(dot)me>, michael <michael(at)paquier(dot)xyz>, "bharath(dot)rupireddyforpostgres" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
| Subject: | Re: Newly created replication slot may be invalidated by checkpoint |
| Date: | 2025-11-20 10:44:10 |
| Message-ID: | CAA4eK1KEZ8b+MMD_vVXPe3AA_eSkrfknFaed5Yhn-sysCuZEsQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Nov 20, 2025 at 4:07 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thursday, November 20, 2025 4:26 PM Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> wrote:
>
> >
> > Concerning reserve_wal_for_local_slot. It seems it is used in synchronization
> > of failover logical slots. For me, it is tricky to change restart_lsn of a synced
> > logical slot to RedoRecPtr, because it may lead to problems with logical
> > replication using such slot after the replica promotion. But it seems it is the
> > architectural problem and it is not related to the problems, solved by the
> > patch.
>
> I think this is not an issue because if we use the redo pointer instead of the
> remote restart_lsn as the initial value, the synced slot won't be marked as
> sync-ready, so user cannot use it after promotion (also well documented). This
> is also the existing behavior before the patch, e.g., if the required WALs were
> removed, the oldest available WAL was used as the initial value, similarly
> resulting in the slot not being sync-ready.
>
Would it be better to discuss this in a separate thread? Though this
is related to original problem but still in a separate part of code
(slotsync) which I think can have a separate fix especially when the
fix is also somewhat different.
> >
> > The change of lock mode to EXCLUSIVE in
> > ReplicationSlotsComputeRequiredLSN may affect the performance when a lot
> > of slots are advanced during some small period of time. It may affect the
> > walsender performance. It advances the logical or physical slots when receive
> > a confirmation from the replica. I guess, the slot advancement may be pretty
> > frequent operation.
>
> Yes, I had the same thought and considered a simple alternative (similar to your
> suggestion below): use an exclusive lock only when updating the slot.restart_lsn
> during WAL reservation, while continuing to use a shared lock in the computation
> function. Additionally, place XLogSetReplicationSlotMinimumLSN() under the lock.
> This approach will also help serialize the process.
>
Can we discuss this as well in a separate thread?
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | John Naylor | 2025-11-20 10:57:25 | Re: tuple radix sort |
| Previous Message | Álvaro Herrera | 2025-11-20 10:42:39 | Re: 10% drop in code line count in PG 17 |