From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Ajin Cherian <itsajin(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date: | 2025-08-29 05:01:39 |
Message-ID: | CAJpy0uASzojKbzinpNu29xuYGsSRnSo=22CLhXaSt_43TVoBhQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 26, 2025 at 9:58 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Aug 22, 2025 at 3:44 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 20, 2025 at 10:53 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > >
> > > I've removed them.
> > > Attaching patch v8 addressing the above comments.
> > >
> >
> > Thanks for the patch. Please find a few comments:
> >
> > 1)
> > When the API is in progress, and meanwhile in another session we turn
> > off hot_standby_feedback, the API session terminates abnormally.
> >
> > postgres=# SELECT pg_sync_replication_slots();
> > server closed the connection unexpectedly
> >
> > It seems slotsync_reread_config() is not adjusted for API. It does
> > proc_exit assuming it is a worker process.
> >
>
> I've removed the API calling ProcessSlotSyncInterrupts() as I don't
> think the API needs to specifically handle shutdown requests, it works
> without calling this. And the other config checks, I don't think the
> API needs to handle, I think we should leave it to the user.
>
> > 2)
> > slotsync_worker_onexit():
> >
> > SlotSyncCtx->slot_persistence_pending = false;
> >
> > /*
> > * If syncing_slots is true, it indicates that the process errored out
> > * without resetting the flag. So, we need to clean up shared memory and
> > * reset the flag here.
> > */
> > if (syncing_slots)
> > {
> > SlotSyncCtx->syncing = false;
> > syncing_slots = false;
> > }
> >
> > Shall we reset slot_persistence_pending inside 'if (syncing_slots)'?
> > slot_persistence_pending can not be true without syncing_slots being
> > true.
> >
> > 3)
> > reset_syncing_flag():
> >
> > SpinLockAcquire(&SlotSyncCtx->mutex);
> > SlotSyncCtx->syncing = false;
> > + SlotSyncCtx->slot_persistence_pending = false;
> > SpinLockRelease(&SlotSyncCtx->mutex);
> >
> > Here we are changing slot_persistence_pending under mutex, while at
> > other places, it is not protected by mutex. Is it intentional here?
> >
> > 4)
> > On rethinking, we maintain anything in shared memory if it has to be
> > shared between a few processes. 'slot_persistence_pending' OTOH is
> > required to be set and accessed by only one process at a time. Shall
> > we move it out of SlotSyncCtxStruct and keep it static similar to
> > 'syncing_slots'? Rest of the setting, resetting flow remains the same.
> > What do you think?
> >
>
> Yes, I agree. I have modified it accordingly.
>
> >
> > 5)
> > /* Build the query based on whether we're fetching all or refreshing
> > specific slots */
> >
> > Perhaps we can shift this comment to where we actually append
> > target_slot_list. Better to have it something like:
> > 'If target_slot_list is provided, construct the query only to fetch given slots'
> >
>
> Changed.
>
> Attaching patch v9 addressing the above comments.
>
Thank You for the patches. Please find a few comments.
1)
Change not needed:
* without waiting for SLOTSYNC_RESTART_INTERVAL_SEC.
+ *
*/
2)
Regarding the naptime in API, I was thinking why not to use
wait_for_slot_activity() directly? If there are no slots activity, it
will keep on doubling the naptime starting from 2sec till max of
30sec. Thoughts?
We can rename MIN_SLOTSYNC_WORKER_NAPTIME_MS and
MAX_SLOTSYNC_WORKER_NAPTIME_MS to MIN_SLOTSYNC_NAPTIME_MS and
MAX_SLOTSYNC_NAPTIME_MS in such a case.
3)
+ * target_slot_list - List of RemoteSlot structures to refresh, or NIL to
+ * fetch all failover slots
Can we please change it to:
List of failover logical slots to fetch from primary, or NIL to fetch
all failover logical slots
4)
In the worker, before each call to synchronize_slots(), we are
starting a new transaction. It aligns with the previous implementation
where StartTransaction was inside synchronize_slots(). But in API, we
are doing StartTransaction once outside of the loop instead of doing
before each synchronize_slots(), is it intentional? It may keep the
transaction open for a long duration for the case where slots are not
getting persisted soon.
5)
With ProcessSlotSyncInterrupts() being removed from API, can you
please check the behaviour of API on smart-shutdown and rest of the
shutdown modes? It should behave like other APIs. And what happens if
I change primary_conninfo to some non-existing server when the API is
running. Does it error out or keep retrying?
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-08-29 05:23:27 | Re: Remove unneeded cast in heap_xlog_lock. |
Previous Message | Masahiko Sawada | 2025-08-29 04:28:59 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |