Re: Improve pg_sync_replication_slots() to wait for primary to advance

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2025-12-04 06:03:50
Message-ID: CAJpy0uB8zM5xTuCHLRqqyS9o2miq_R9p8xDGonCTJQ634h+KCQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 4, 2025 at 10:51 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Wed, Dec 3, 2025 at 10:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 3, 2025 at 8:51 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > Attaching patch v28 addressing these comments.
> > >
> >
> > Can we extract the part of the patch that handles SIGUSR1 signal
> > separately as a first patch and the remaining as a second patch?
> > Please do mention the reason in the commit message as to why we are
> > changing the signal for SIGINT to SIGUSR1.
> >
>
> I have extracted out the SIGUSR1 signal handling changes separately
> into a patch and sharing. I will share the next patch later.
> Let me know if there are any comments for this patch.
>

I have just 2 trivial comments for v29-001:

1)
-   * receives a SIGINT from the startup process, or when there is an error.
+   * receives a SIGUSR1 from the startup process, or when there is an error.

In above we should mention stopSignaled rather than SIGUSR1, as
SIGUSR1 is just a wakeup signal and not termination signal.

2)
+    else
+      ereport(ERROR,
+          errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+          errmsg("cannot continue replication slot synchronization"
+             " as standby promotion is triggered"));

Please mention that it is SQL-function in the comment for else-block.

~~

I tested the touched scenarios and here are the LOGs:

a)
When promotion is ongoing and the startup process has terminated
slot-sync worker but if the postmaster has not noticed that, it may
end up starting slotsync worker again. For that scenario, we get
these:

11:03:19.712 IST [151559] LOG: replication slot synchronization
worker is shutting down as promotion is triggered
11:03:19.726 IST [151629] LOG: slot sync worker started
11:03:19.795 IST [151629] LOG: replication slot synchronization
worker is shutting down as promotion is triggered

b)
On promotion, API gets this (originating from ProcessSlotSyncInterrupts now):
postgres=# SELECT pg_sync_replication_slots();
ERROR: cannot continue replication slot synchronization as standby
promotion is triggered

c)
If any parameter is changed between ValidateSlotSyncParams() and
ProcessSlotSyncInterrupts() for API, we get this:
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization will stop because of a parameter change

--on re-run (originating from ValidateSlotSyncParams())
postgres=# SELECT pg_sync_replication_slots();
ERROR: replication slot synchronization requires
"hot_standby_feedback" to be enabled

~~

The tested scenarios' behaviour looks good to me.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message cca5507 2025-12-04 06:11:29 Re: Support loser tree for k-way merge
Previous Message Amit Kapila 2025-12-04 05:58:03 Re: Newly created replication slot may be invalidated by checkpoint