RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-11-29 09:12:54
Message-ID: OS0PR01MB571693E14A4460D9BFC23FF19483A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 29, 2023 11:04 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:

Thanks for the comments.

> ======
> 1. General.
>
> Previously (see [1] #0) I asked a question about if there is some documentation
> missing. Seems not yet answered.

The document was add in V39-0002 in logicaldecoding.sgml
because some necessary GUCs for slotsync are not in 0001.

>
> ======
> Commit message
>
> 2.
> Users can set this flag during CREATE SUBSCRIPTION or during
> pg_create_logical_replication_slot API. Examples:
>
> CREATE SUBSCRIPTION mysub CONNECTION '..' PUBLICATION mypub WITH
> (failover = true);
>
> (failover is the last arg)
> SELECT * FROM pg_create_logical_replication_slot('myslot',
> 'pgoutput', false, true, true);
>
> ~
>
> I felt it is better to say "Ex1" / "Ex2" (or "1" / "2" or something
> similar) to indicate better where these examples start and finish, otherwise they
> just sort of get lost among the text.

Changed.

>
> ======
> doc/src/sgml/catalogs.sgml
>
> 3.
> From previous review ([1] #6) I suggested reordering fields. Hous-san
> wrote: "but I think the functionality of two fields are different and I didn’t find
> the correlation between two fields except for the type of value."
>
> Yes, that is true. OTOH, I felt grouping the attributes by the same types made
> the docs easier to read.

The document's order should be same as the pg_subscription catalog, and I
prefer not to move the new subfailoverstate in the middle of catalog as the
functionality of them is different.

>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 4. CreateSubscription
>
> + /*
> + * If only the slot_name is specified (without create_slot option),
> + * it is possible that the user intends to use an existing slot on
> + * the publisher, so here we enable failover for the slot if
> + * requested.
> + */
> + else if (opts.slot_name && failover_enabled) {
>
> Unanswered question from previous review (see [1] #11a). i.e. How does this
> condition ensure that *only* the slot name was specified (like the comment is
> saying)?

It is the else part of 'if (opts.create_slot)', so it means create_slot is not
specified while only slot_name is specified. I have improved the comment.

>
> ~~~
>
> 5. AlterSubscription
>
> errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when two_phase is enabled"),
> errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh = false,
> or with copy_data = false, or use DROP/CREATE SUBSCRIPTION.")));
>
> + if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED &&
> + opts.copy_data) ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
> when failover is enabled"),
> + errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh =
> false, or with copy_data = false, or use DROP/CREATE SUBSCRIPTION.")));
> +
>
> There are translations issues same as reported in my previous review (see [1]
> #12b and also several other places as noted in [1]). Hou-san replied that I "can
> start a separate thread to change the twophase related messages, and we can
> change accordingly if it's accepted.", but that's not right IMO because it is only
> the fact that this sysncslot patch is reusing a similar message that warrants the
> need to extract a "common" message part in the first place. So I think it is
> responsibility if this sycslot patch to make this change.

OK, changed.

>
> ======
> src/backend/replication/logical/tablesync.c
>
> 6. process_syncing_tables_for_apply
>
> + if (MySubscription->twophasestate ==
> + LOGICALREP_TWOPHASE_STATE_PENDING)
> + ereport(LOG,
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that two_phase can be enabled",
> + MySubscription->name)));
> +
> + if (MySubscription->failoverstate ==
> + LOGICALREP_FAILOVER_STATE_PENDING)
> + ereport(LOG,
> + (errmsg("logical replication apply worker for subscription \"%s\"
> will restart so that failover can be enabled",
> + MySubscription->name)));
>
> 6a.
> You may end up log 2 restart messages for the same restart. Is it OK?

I think it's OK as it can provide complete information.

>
> ~
>
> 6b.
> This is another example where you should share the same common message
> (for less translations)

I adjusted the message there.

>
> ======
> src/backend/replication/logical/worker.c
>
> 7.
> + * The logical slot on the primary can be synced to the standby by
> + specifying
> + * the failover = true when creating the subscription. Enabling
> + failover allows
> + * us to smoothly transition to the standby in case the primary gets
> + promoted,
> + * ensuring that we can subscribe to the new primary without losing any data.
>
> /the failover = true/the failover = true option/
>
> or
>
> /the failover = true/failover = true/
>

Changed.

> ~~~
>
> 8.
> +
> #include "postgres.h"
>
> Unnecessary extra blank line

Removed.

>
> ======
> src/backend/replication/slot.c
>
> 9. validate_standby_slots
>
> There was no reply to the comment in my previous review (see [1] #27).
> Maybe you disagree or maybe accidentally overlooked?

The error message has already been adjusted in V39.
I adjusted the check in this version as well to be consistent.

>
> ~~~
>
> 10. check_standby_slot_names
>
> In previous review I asked ([1] #28) why a special check was needed for "*".
> Hou-san replied that "SplitIdentifierString() does not give error for '*' and '*'
> can be considered as valid value which if accepted can mislead user".
>
> Sure, but won't the code then just try to find if there is a replication slot called
> "*" and that will fail. That was my point, if the slot name lookup is going to fail
> anyway then why have the extra code for the special "*" case up-front? Note --
> I haven't tried it, so maybe code doesn't work like I think it does.

I think allowing "*" can mislead user because it normally means every slot, but
we don't want to support the "every slot" option as mentioned in the comment.
So I think reject it here is fine. Reporting ERROR because the slot named '*' was not
there may look confusing.

>
> ======
> src/backend/replication/walsender.c
>
> 11. PhysicalWakeupLogicalWalSnd
>
> No reply to my previous review comment ([1] #33). Not done? Disagreed, or
> accidentally missed?

The function mentioned in your previous comment has been removed in
previous version, so I am not sure are you pointing to some other codes
that has similar issues ?

>
> ~~~
>
> 12. WalSndFilterStandbySlots
>
> + /*
> + * If logical slot name is given in standby_slot_names, give WARNING
> + * and skip it. Since it is harmless, so WARNING should be enough, no
> + * need to error-out.
> + */
> + else if (SlotIsLogical(slot))
> + warningfmt = _("cannot have logical replication slot \"%s\" in
> parameter \"%s\", ignoring");
>
> I previously raised an issue (see [1] #35) thinking this could not happen.
> Hou-san explained how it might happen ("user could drop the logical slot and
> recreate a physical slot with the same name without changing the GUC.") so this
> code was necessary. That is OK, but I think your same explanation in the code
> commen.

OK, I have added comments here.

>
> ~~~
>
> 13. WalSndFilterStandbySlots
>
> + standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc);
>
> I previously raised issue (see [1] #36). Hou-san replied "I think it's OK to remove
> slots if it's invalidated, dropped, or was changed to logical one as we don't
> need to wait for these slots to catch up anymore."
>
> Sure, maybe code is fine, but my point was that the code is removing elements
> *more* scenarios than are mentioned by the function comment, so maybe
> update that function comment for all the removal scenarios.

Updated the comments.

>
> ~~~
>
> 14. WalSndWaitForStandbyConfirmation
>
> The comment change from my previous review ([1] #37) not done.
> Disagreed, or accidentally missed?

Thanks for pointing, this was missed.

>
> ~~~
>
> 15. WalSndWaitForStandbyConfirmation
>
> The question about calling ConditionVariablePrepareToSleep in my previous
> review ([1] #39) not answered. Accidentally missed?

I think V39 has already adjusted the order of reload and NIL check in this function.

>
> ~~~
>
> 16. WalSndWaitForWal
>
> if (RecentFlushPtr != InvalidXLogRecPtr &&
> loc <= RecentFlushPtr)
> - return RecentFlushPtr;
> + {
> + WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots);
>
> It is better to use XLogRecPtrIsInvalid macro here. I know it was not strictly
> added by your patch, but so much else changed nearby so I thought this should
> be fixed at the same time.

Changed.

>
> ======
> src/bin/pg_upgrade/info.c
>
> 17. get_old_cluster_logical_slot_infos
>
> +
> slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * num_slots);
>
> Excessive whitespace.

Removed.

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2023-11-29 09:17:04 RE: Synchronizing slots from primary to standby
Previous Message Zhijie Hou (Fujitsu) 2023-11-29 09:11:30 RE: Synchronizing slots from primary to standby