RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-12-08 12:05:50
Message-ID: OS0PR01MB57160AA44EE6E3BBEDB67F84948AA@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, December 7, 2023 7:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 6, 2023 at 4:53 PM shveta malik <shveta(dot)malik(at)gmail(dot)com>
> wrote:
> >
> > v43-001:
> > 1) Support of 'failover' dump in pg_dump. It was missing earlier.
> >
>
> Review v43-0001
> ================
> 1.
> + * However, we do not enable failover for slots created by the table
> + sync
> + * worker. This is because the table sync slot might not be fully
> + synced on the
> + * standby.
>
> The reason for not enabling failover for table sync slots is not clearly
> mentioned.
>
> 2.
> During syncing, the local restart_lsn and/or local catalog_xmin of
> + * the newly created slot on the standby are typically ahead of those
> + on the
> + * primary. Therefore, the standby needs to wait for the primary
> + server's
> + * restart_lsn and catalog_xmin to catch up, which takes time.
>
> I think this part of the comment should be moved to 0002 patch. We can
> probably describe a bit more about why slot on standby will be ahead and
> about waiting time.
>
> 3.
> validate_standby_slots()
> {
> ...
> + slot = SearchNamedReplicationSlot(name, true);
> +
> + if (!slot)
> + goto ret_standby_slot_names_ng;
> +
> + if (!SlotIsPhysical(slot))
> + {
> + GUC_check_errdetail("\"%s\" is not a physical replication slot",
> + name); goto ret_standby_slot_names_ng; }
>
> Why the first check (slot not found) doesn't have errdetail? The goto's in this
> function look a bit odd, can we try to avoid those?
>
> 4.
> + /* Verify syntax and parse string into list of identifiers */ if
> + (!SplitIdentifierString(rawname, ',', &elemlist)) {
> + /* syntax error in name list */
> + GUC_check_errdetail("List syntax is invalid.");
> ...
> ...
> + if (!SplitIdentifierString(standby_slot_names_cpy, ',',
> + &standby_slots)) {
> + /* This should not happen if GUC checked check_standby_slot_names. */
> + elog(ERROR, "invalid list syntax");
>
> Both are checking the same string but giving different error messages.
> I think the error message should be the same in both cases. The first one seems
> better.
>
> 5. In WalSndFilterStandbySlots(), the comments around else if checks should
> move inside the checks. It is hard to read the code in the current format. I have
> tried to change the same in the attached.
>
> Apart from the above, I have changed the comments and made some minor
> cosmetic changes in the attached. Kindly include in next version if you are fine
> with it.

Thanks for the comments and changes, I have addressed them.
Here is the V44 patch set which addressed comments above and [1].

The new version patches also include the follow changes:

V44-0001
* Let the pg_replication_slot_advance also wait for the slots specified
in standby_slot_names to catch up.
* added few test cases to cover the wait/wakeup logic in
walsender related to standby_slot_names.
* ran pgindent.

V44-0002
* added few comments to explain the case when the slot is valid on primary
while is invalidated on standby.

Thanks Ajin for analyzing and making the tests.

The pending comments on 0002 will be addressed in next version.

[1] https://www.postgresql.org/message-id/CAHut%2BPvRD5V-zzTvffDdcnqB1T4JNATKGgw%2BwdQCKAgeCYr0xQ%40mail.gmail.com

Best Regards,
Hou zj

Attachment Content-Type Size
v44-0003-Allow-slot-sync-worker-to-wait-for-the-cascading.patch application/octet-stream 7.9 KB
v44-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch application/octet-stream 145.9 KB
v44-0002-Add-logical-slot-sync-capability-to-the-physical.patch application/octet-stream 84.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-12-08 12:33:33 Re: Refactoring backend fork+exec code
Previous Message Charan K 2023-12-08 12:04:56 Assistance Needed: PostgreSQL Migration Errors 13.2 to 15