| From: | Ajin Cherian <itsajin(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
| Date: | 2025-12-10 05:07:34 |
| Message-ID: | CAFPTHDbdKU_5y0H8RPZNkbCQUUwdCDbWTdhD7tjc5geG37CF+g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Dec 10, 2025 at 3:05 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Dec 10, 2025 at 8:10 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 10, 2025 at 1:29 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> > >
> > > Hi Ajin,
> > >
> > > I’d like to revisit this patch, but looks like 04396eacd3faeaa4fa3d084a6749e4e384bdf0db has some conflicts to this patch. So can you please rebase this patch?
> > >
> > > Best regards,
> > > --
> >
> > It's been rebased. Have a look at the latest version.
> >
>
> Few comments on 001:
>
> 1)
> /*
> * Emit an error if a promotion or a concurrent sync call is in progress.
> * Otherwise, advertise that a sync is in progress.
> */
> static void
> check_and_set_sync_info
>
> We need to change this comment because now this function does not
> handle promotion case.
Fixed.
>
> 2)
> + if (sync_process_pid!= InvalidPid)
> + kill(sync_process_pid, SIGUSR1);
>
> We need to have space between sync_process_pid and '!='
>
Fixed.
> 3)
> + * Exit or throw errors if relevant GUCs have changed depending on whether
>
> errors->error
>
Fixed.
> 4)
> In slotsync_reread_config(), even when we mark parameter_changed=true
> in the first if-block, we still go to the second if-block which was
> not needed. So shall we make second if-block as else-if to avoid
> this? Thoughts?
>
Changed as suggested.
> 5)
> As discussed in [1], we can make this change in ProcessSlotSyncInterrupts():
>
> 'replication slot synchronization worker is shutting down because
> promotion is triggered'
> to
> 'replication slot synchronization worker will stop because promotion
> is triggered'
>
> [1]: https://www.postgresql.org/message-id/6AE56C64-F760-4CBD-BABF-72633D3F7B5E%40gmail.com
>
Changed.
On Wed, Dec 10, 2025 at 3:27 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>
> Here are some comments for v32.
>
> 1 - 0001
> ```
> - * The slot sync worker's pid is needed by the startup process to shut it
> - * down during promotion. The startup process shuts down the slot sync worker
> - * and also sets stopSignaled=true to handle the race condition when the
> + * The pid is either the slot sync worker's pid or the backend's pid running
> ```
>
> I think we should add single quotes for “pid” and “stopSignaled". Looking at other comment lines, structure fields all have single-quoted:
> ```
> * The 'syncing' flag is needed to prevent concurrent slot syncs to avoid slot
>
> * The 'last_start_time' is needed by postmaster to start the slot sync worker
> ```
>
Changed as suggested.
> 2 - 0001 - the same code block as 1
>
> I wonder how to distinct if the “pid” is a slot sync worker or a backend process?
>
No, there is now way currently and iis not really required with the
current logic.
> 3 - 0001
> ```
> + bool worker = AmLogicalSlotSyncWorkerProcess();
> ```
>
> The variable name “worker” doesn’t indicate a bool type, maybe renamed to “is_slotsync_worker”.
>
Changed as suggested.
> 4 - 0001
> ```
> + /*
> + * If we have reached here with a parameter change, we must be running in
> + * SQL function, emit error in such a case.
> + */
> + if (parameter_changed)
> + {
> + Assert(!worker);
> + ereport(ERROR,
> + errmsg("replication slot synchronization will stop because of a parameter change"));
> }
> ```
>
> The Assert(!worker) feels redundant, because it then immediately error out.
>
I don't think it is redundant as Asserts are used to catch unexpected
code paths in testing.
> 5 - 0001
> ```
> + * Exit or throw errors if relevant GUCs have changed depending on whether
> + * called from slotsync worker or from SQL function pg_sync_replication_slots()
> ```
>
> Let’s change “slotsync” to “slot sync” because elsewhere comments all use “slot sync”, just to keep consistent.
>
Changed.
> 6 - 0001
> ```
> - * Interrupt handler for main loop of slot sync worker.
> + * Interrupt handler for main loop of slot sync worker and
> + * SQL function pg_sync_replication_slots().
> ```
>
> Missing “the” before “SQL function”. This comment applies to multiple places.
>
Changed.
> 7 - 0001
> ```
> + if (sync_process_pid!= InvalidPid)
> + kill(sync_process_pid, SIGUSR1);
> ```
>
> Nit: missing a white space before “!="
>
Fixed.
> 8 - 0002
> ```
> + if (slot_names != NIL)
> {
> - StartTransactionCommand();
> - started_tx = true;
> + bool first_slot = true;
> +
> + /*
> + * Construct the query to fetch only the specified slots
> + */
> + appendStringInfoString(&query, " AND slot_name IN (");
> +
> + foreach_ptr(char, slot_name, slot_names)
> + {
> + if (!first_slot)
> + appendStringInfoString(&query, ", ");
> +
> + appendStringInfo(&query, "'%s'", slot_name);
> + first_slot = false;
> + }
> + appendStringInfoChar(&query, ')');
> }
> ```
>
> The logic of appending “, “ can be slightly simplified as:
> ```
> If (slot_names != NIL)
> {
> Const char *sep = “”;
> appendStringInfoString(&query, " AND slot_name IN (“);
> foreach_ptr(char, slot_name, slot_names)
> {
> appendStringInfo(&query, “%s'%s'", sep, slot_name);
> sep = “, “;
> }
> }
> ```
>
> That saves a “if” check and a appendStringInfoString().
I'm not sure if this is much of an improvement, I like the current
approach and matches with similar coding patterns in the code base.
Attaching v34 addressing the above comments.
regards,
Ajin Cherian
Fujitsu Australia
| Attachment | Content-Type | Size |
|---|---|---|
| v34-0001-Signal-backends-running-pg_sync_replication_slot.patch | application/octet-stream | 10.5 KB |
| v34-0002-Improve-initial-slot-synchronization-in-pg_sync_.patch | application/octet-stream | 22.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2025-12-10 05:10:20 | greenfly lwlock corruption in REL_14_STABLE and REL_15_STABLE |
| Previous Message | cca5507 | 2025-12-10 05:01:18 | Re: [BUG] Incorrect historic snapshot may be serialized to disk during fast-forwarding |