From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | Ajin Cherian <itsajin(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date: | 2025-08-14 12:44:29 |
Message-ID: | CAExHW5syhJm6qhXCtmNAP-=d9qp+SZMtniYCoW7C1dzT7pMJSw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
>
> 8)
> + /* If refreshing, free the original list structures */
> + if (is_refresh)
> + {
> + foreach(lc, remote_slot_list)
> + {
> + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
> + pfree(old_slot);
> + }
> + list_free(remote_slot_list);
> + }
>
> We can get rid of 'is_refresh' and can simply check if
> 'target_slot_list != NIL', free it. We can use list_free_deep instead
> of freeing each element. Having said that, it looks slightly odd to
> free the list in this function, I will think more here. Meanwhile, we
> can do this.
>
+1. The function prologue doesn't mention that the original list is
deep freed. So a caller may try to access it after this call, which
will lead to a crash. As a safe programming practice we should let the
caller free the original list if it is not needed anymore OR modify
the input list in-place and return it for the convenience of the
caller like all list_* interfaces. At least we should document this
behavior in the function prologue. You could also use foreach_ptr
instead of foreach.
> 13)
> - errmsg("could not synchronize replication slot \"%s\"",
> - remote_slot->name),
> - errdetail("Synchronization could lead to data loss, because the
> remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
> standby has LSN %X/%08X and catalog xmin %u.",
> - LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> - remote_slot->catalog_xmin,
> - LSN_FORMAT_ARGS(slot->data.restart_lsn),
> - slot->data.catalog_xmin));
> + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
> + remote_slot->name),
> + errdetail("Attempting Synchronization could lead to data loss,
> because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
> but the standby has LSN %X/%08X and catalog xmin %u.",
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> + remote_slot->catalog_xmin,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + slot->data.catalog_xmin));
>
> We can retain the same message as it was put after a lot of
> discussion. We can attempt to change if others comment. The idea is
> since a worker dumps it in each subsequent cycle (if such a situation
> arises), on the same basis now the API can also do so because it is
> also performing multiple cycles now. Earlier I had suggested changing
> it for API based on messages 'continuing to wait..' which are no
> longer there now.
Also we usually don't use capital letters at the start of the error
message. Any reason this is different?
Some more
+ * When called from pg_sync_replication_slots, use a fixed 2
+ * second wait time.
Function prologue doesn't mention this. Probably the prologue should
contain only the first sentence there. Rest of the prologues just
repeat comments in the function. The function is small enough that a
reader could read the details from the function instead of the
prologue.
+ wait_time = SLOTSYNC_API_NAPTIME_MS;
+ } else {
} else and { should be on separate lines.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2025-08-14 12:49:23 | RE: memory leak in logical WAL sender with pgoutput's cachectx |
Previous Message | Tom Lane | 2025-08-14 12:30:51 | Re: cfbot mistakenly reports that a rebase is needed |