Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-07-14 08:23:50
Message-ID: CAHut+PuMAiO_X_Kw6ud-jr5WOm+rpkdu7CppDU6mu=gY7UVMzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda-san.

Here are some review comments for the v17-0003 patch. They are all minor.

======
Commit message

1.
Previously tablesync workers establish new connections when it changes
the syncing
table, but this might have additional overhead. This patch allows to
reuse connections
instead.

~

/This patch allows to reuse connections instead./This patch allows the
existing connection to be reused./

~~~

2.
As for the publisher node, this patch allows to reuse logical
walsender processes
after the streaming is done once.

~

Is this paragraph even needed? Since the connection is reused then it
already implies the other end (the Wlasender) is being reused, right?

======
src/backend/replication/logical/tablesync.c

3.
+ * FIXME: set appropriate application_name. Previously, the slot name was used
+ * because the lifetime of the tablesync worker was same as that, but now the
+ * tablesync worker handles many slots during the synchronization so that it is
+ * not suitable. So what should be? Note that if the tablesync worker starts to
+ * reuse the replication slot during synchronization, we should use the slot
+ * name as application_name again.
+ */
+static void
+ApplicationNameForTablesync(Oid suboid, int worker_slot,
+ char *application_name, Size szapp)

3a.
I felt that most of this FIXME comment belongs with the calling code,
not here.

3b.
Also, maybe it needs some rewording -- I didn't understand exactly
what it is trying to say.

~~~

4.
- /*
- * Here we use the slot name instead of the subscription name as the
- * application_name, so that it is different from the leader apply worker,
- * so that synchronous replication can distinguish them.
- */
- LogRepWorkerWalRcvConn =
- walrcv_connect(MySubscription->conninfo, true,
- must_use_password,
- slotname, &err);
+ /* Connect to the publisher if haven't done so already. */
+ if (LogRepWorkerWalRcvConn == NULL)
+ {
+ char application_name[NAMEDATALEN];
+
+ /*
+ * The application_name must be also different from the leader apply
+ * worker because synchronous replication must distinguish them.
+ */
+ ApplicationNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->worker_slot,
+ application_name,
+ NAMEDATALEN);
+ LogRepWorkerWalRcvConn =
+ walrcv_connect(MySubscription->conninfo, true,
+ must_use_password,
+ application_name, &err);
+ }
+

Should the comment mention the "subscription name" as it did before?

SUGGESTION
The application_name must differ from the subscription name (used by
the leader apply worker) because synchronous replication has to be
able to distinguish this worker from the leader apply worker.

======
src/backend/replication/logical/worker.c

5.
-start_table_sync(XLogRecPtr *origin_startpos, char **myslotname)
+start_table_sync(XLogRecPtr *origin_startpos,
+ char **myslotname)

This is a wrapping change only. It looks like an unnecessary hangover
from a previous version of 0003.

======
src/backend/replication/walsender.c

6. exec_replication_command

+
if (cmd->kind == REPLICATION_KIND_PHYSICAL)
StartReplication(cmd);
~

The extra blank line does not belong in this patch.

======
src/include/replication/worker_internal.h

+ /* Indicates the slot number which corresponds to this LogicalRepWorker. */
+ int worker_slot;
+

6a
I think this field is very fundamental, so IMO it should be defined at
the top of the struct, maybe nearby the other 'in_use' and
'generation' fields.

~

6b.
Also, since this is already a "worker" struct so there is no need to
have "worker" in the field name again -- just "slot_number" or
"slotnum" might be a better name.

And then the comment can also be simplified.

SUGGESTION
/* Slot number of this worker. */
int slotnum;

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-07-14 08:24:43 Re: Remove distprep
Previous Message Amit Kapila 2023-07-14 08:13:04 Re: [Patch] Use *other* indexes on the subscriber when REPLICA IDENTITY is FULL