Re: Synchronizing slots from primary to standby

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-11-29 03:04:22
Message-ID: CAHut+PsuSWjm7U_sVnL8FXZ7ZQcfCcT44kAK7i6qMG35Cwjy3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Here are some review comments for the patch v39_2-0001.

Multiple items from my previous review [1] seemed unanswered, so it
wasn't clear if they were discarded because they were wrong or maybe
accidently missed. I've repeated all those again here, as well as some
new comments.

======
1. General.

Previously (see [1] #0) I asked a question about if there is some
documentation missing. Seems not yet answered.

======
Commit message

2.
Users can set this flag during CREATE SUBSCRIPTION or during
pg_create_logical_replication_slot API. Examples:

CREATE SUBSCRIPTION mysub CONNECTION '..' PUBLICATION mypub
WITH (failover = true);

(failover is the last arg)
SELECT * FROM pg_create_logical_replication_slot('myslot',
'pgoutput', false, true, true);

~

I felt it is better to say "Ex1" / "Ex2" (or "1" / "2" or something
similar) to indicate better where these examples start and finish,
otherwise they just sort of get lost among the text.

======
doc/src/sgml/catalogs.sgml

3.
From previous review ([1] #6) I suggested reordering fields. Hous-san
wrote: "but I think the functionality of two fields are different and
I didn’t find the correlation between two fields except for the type
of value."

Yes, that is true. OTOH, I felt grouping the attributes by the same
types made the docs easier to read.

======
src/backend/commands/subscriptioncmds.c

4. CreateSubscription

+ /*
+ * If only the slot_name is specified (without create_slot option),
+ * it is possible that the user intends to use an existing slot on
+ * the publisher, so here we enable failover for the slot if
+ * requested.
+ */
+ else if (opts.slot_name && failover_enabled)
+ {

Unanswered question from previous review (see [1] #11a). i.e. How does
this condition ensure that *only* the slot name was specified (like
the comment is saying)?

~~~

5. AlterSubscription

errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
when two_phase is enabled"),
errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh =
false, or with copy_data = false, or use DROP/CREATE
SUBSCRIPTION.")));

+ if (sub->failoverstate == LOGICALREP_FAILOVER_STATE_ENABLED && opts.copy_data)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("ALTER SUBSCRIPTION with refresh and copy_data is not allowed
when failover is enabled"),
+ errhint("Use ALTER SUBSCRIPTION ... SET PUBLICATION with refresh =
false, or with copy_data = false, or use DROP/CREATE
SUBSCRIPTION.")));
+

There are translations issues same as reported in my previous review
(see [1] #12b and also several other places as noted in [1]). Hou-san
replied that I "can start a separate thread to change the twophase
related messages, and we can change accordingly if it's accepted.",
but that's not right IMO because it is only the fact that this
sysncslot patch is reusing a similar message that warrants the need to
extract a "common" message part in the first place. So I think it is
responsibility if this sycslot patch to make this change.

======
src/backend/replication/logical/tablesync.c

6. process_syncing_tables_for_apply

+ if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING)
+ ereport(LOG,
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that two_phase can be enabled",
+ MySubscription->name)));
+
+ if (MySubscription->failoverstate == LOGICALREP_FAILOVER_STATE_PENDING)
+ ereport(LOG,
+ (errmsg("logical replication apply worker for subscription \"%s\"
will restart so that failover can be enabled",
+ MySubscription->name)));

6a.
You may end up log 2 restart messages for the same restart. Is it OK?

~

6b.
This is another example where you should share the same common message
(for less translations)

======
src/backend/replication/logical/worker.c

7.
+ * The logical slot on the primary can be synced to the standby by specifying
+ * the failover = true when creating the subscription. Enabling failover allows
+ * us to smoothly transition to the standby in case the primary gets promoted,
+ * ensuring that we can subscribe to the new primary without losing any data.

/the failover = true/the failover = true option/

or

/the failover = true/failover = true/

~~~

8.
+
#include "postgres.h"

Unnecessary extra blank line

======
src/backend/replication/slot.c

9. validate_standby_slots

There was no reply to the comment in my previous review (see [1] #27).
Maybe you disagree or maybe accidentally overlooked?

~~~

10. check_standby_slot_names

In previous review I asked ([1] #28) why a special check was needed
for "*". Hou-san replied that "SplitIdentifierString() does not give
error for '*' and '*' can be considered as valid value which if
accepted can mislead user".

Sure, but won't the code then just try to find if there is a
replication slot called "*" and that will fail. That was my point, if
the slot name lookup is going to fail anyway then why have the extra
code for the special "*" case up-front? Note -- I haven't tried it, so
maybe code doesn't work like I think it does.

======
src/backend/replication/walsender.c

11. PhysicalWakeupLogicalWalSnd

No reply to my previous review comment ([1] #33). Not done? Disagreed,
or accidentally missed?

~~~

12. WalSndFilterStandbySlots

+ /*
+ * If logical slot name is given in standby_slot_names, give WARNING
+ * and skip it. Since it is harmless, so WARNING should be enough, no
+ * need to error-out.
+ */
+ else if (SlotIsLogical(slot))
+ warningfmt = _("cannot have logical replication slot \"%s\" in
parameter \"%s\", ignoring");

I previously raised an issue (see [1] #35) thinking this could not
happen. Hou-san explained how it might happen ("user could drop the
logical slot and recreate a physical slot with the same name without
changing the GUC.") so this code was necessary. That is OK, but I
think your same explanation in the code commen.

~~~

13. WalSndFilterStandbySlots

+ standby_slots_cpy = foreach_delete_current(standby_slots_cpy, lc);

I previously raised issue (see [1] #36). Hou-san replied "I think it's
OK to remove slots if it's invalidated, dropped, or was changed to
logical one as we don't need to wait for these slots to catch up
anymore."

Sure, maybe code is fine, but my point was that the code is removing
elements *more* scenarios than are mentioned by the function comment,
so maybe update that function comment for all the removal scenarios.

~~~

14. WalSndWaitForStandbyConfirmation

The comment change from my previous review ([1] #37) not done.
Disagreed, or accidentally missed?

~~~

15. WalSndWaitForStandbyConfirmation

The question about calling ConditionVariablePrepareToSleep in my
previous review ([1] #39) not answered. Accidentally missed?

~~~

16. WalSndWaitForWal

if (RecentFlushPtr != InvalidXLogRecPtr &&
loc <= RecentFlushPtr)
- return RecentFlushPtr;
+ {
+ WalSndFilterStandbySlots(RecentFlushPtr, &standby_slots);

It is better to use XLogRecPtrIsInvalid macro here. I know it was not
strictly added by your patch, but so much else changed nearby so I
thought this should be fixed at the same time.

======
src/bin/pg_upgrade/info.c

17. get_old_cluster_logical_slot_infos

+
slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
num_slots);

Excessive whitespace.

======
[1] My previous review of v35-0001.
https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD55bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-11-29 03:16:52 Re: common signal handler protection
Previous Message Amit Kapila 2023-11-29 02:58:46 Re: Synchronizing slots from primary to standby