RE: Improve pg_sync_replication_slots() to wait for primary to advance

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Yilin Zhang <jiezhilove(at)126(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Improve pg_sync_replication_slots() to wait for primary to advance
Date: 2026-02-10 07:55:19
Message-ID: TY4PR01MB1690752448C843C7DD4474FB89462A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, February 10, 2026 3:02 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Mon, Feb 9, 2026 at 11:36 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Adjusted the comments as suggested.
> >
> > In addition to addressing the comments, I revisited the recently
> > updated slotsync code and noticed opportunities to simplify some
> > parameters, checks, and codes. This will also facilitate the improvement in
> v2-0001 coding.
> >
> > * Previously, certain function parameters(found_consistent_snapshot,
> > remote_slot_precedes of update_local_synced_slot()) were used to store
> > the reason for slot synchronization being skipped. However, now that a
> > slot property serves this purpose, we can simplify the code by
> > eliminating those redundant parameters and using the slot's property to
> perform the same check.
> >
> > * The slot synchronization is skipped if the required WAL has not been
> > received and flushed. Previously, this check[1] was performed in two
> separate code paths.
> > Such duplication can lead to coding errors if changes are made in one
> > location without updating the other, as exemplified by the issue fixed in
> commit 3df4df5.
> > This commit consolidates the check into a single location to eliminate
> > redundancies and reduce the potential for future errors.
> >
> > To address these points, I have created two additional patches:
> > V3-0001 for the first point and V3-0002 for the second. V3-0003
> > contains the current improvement being discussed, and it's also simplified
> thanks to the preceding patches.
> >
> I like the idea of both the new patches. Please find a few trivial comments:

Thanks for the comments.

>
> patch002:
> 1)
> Earlier at both the places where we were updating
> 'SS_SKIP_WAL_NOT_FLUSHED', we were returning slot_updated as false,
> now, we might end up returning it as true (specially at second occurrence). Is
> this intentional?

Yes. I think it's OK in the second occurrence because we did create a new temp
slot and give some initial value for the slot. I think it's similar to
SS_SKIP_WAL_OR_ROWS_REMOVED and SS_SKIP_NO_CONSISTENT_SNAPSHOT where we also
return slot_updated=true in case of initial sync.

>
> 2)
> In update_and_persist_local_synced_slot(), we can reach this even when
> wal_not_flushed, so we shall to update comment:
> if (slot->slotsync_skip_reason != SS_SKIP_NONE)
> {
> /*
> * We reach here when the remote slot didn't catch up to locally
> * reserved position, or it cannot reach the consistent point from the
> * restart_lsn.
> ....
> */

Updated.

>
> Patch003:
> 3)
> + if (slotsync_pending && slot->slotsync_skip_reason != SS_SKIP_NONE)
> + *slotsync_pending = true;
>
> Here shall we ensure by a sanity check that slotsync_skip_reason !=
> SS_SKIP_INVALID?

Added an Assert for it.

> And please bring back the comment as well, which was
> there in an earlier patch which stated the reason for not including
> 'SS_SKIP_INVALID' here.

After rethinking, I chose to add the comments atop of file
along with other related comments.

Best Regards,
Hou zj

Attachment Content-Type Size
v4-0004-Add-a-taptest.patch application/octet-stream 3.3 KB
v4-0002-Refactoring-move-similar-checks-to-a-central-plac.patch application/octet-stream 6.2 KB
v4-0001-Refactoring-remove-some-unnecessary-func-paramete.patch application/octet-stream 6.7 KB
v4-0003-Improve-the-retry-logic-in-pg_sync_replication_sl.patch application/octet-stream 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-02-10 08:02:53 Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Previous Message Antonin Houska 2026-02-10 07:46:27 Re: Buffer locking is special (hints, checksums, AIO writes)