Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-20 22:01:10
Message-ID: CAHut+Pu6Y+BkYKg6MYGi2zGnx6imeK4QzxBVhpQoPMeDr1npnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v3-0001.

(I haven't looked at the test code yet)

======
Commit Message

1.
If the subscription sets min_send_delay parameter, an apply worker passes the
value to the publisher as an output plugin option. And then, the walsender will
delay the transaction sending for given milliseconds.

~

1a.
"an apply worker" --> "the apply worker (via walrcv_startstreaming)".

~

1b.
"And then, the walsender" --> "The walsender"

~~~

2.
The combination of parallel streaming mode and min_send_delay is not allowed.
This is because in parallel streaming mode, we start applying the transaction
stream as soon as the first change arrives without knowing the transaction's
prepare/commit time. Always waiting for the full 'min_send_delay' period might
include unnecessary delay.

~

Is there another reason not to support this?

Even if streaming + min_send_delay incurs some extra delay, is that a
reason to reject outright the combination? What difference will the
potential of a few extra seconds overhead make when min_send_delay is
more likely to be far greater (e.g. minutes or hours)?

~~~

3.
The other possibility was to apply the delay at the end of the parallel apply
transaction but that would cause issues related to resource bloat and
locks being
held for a long time.

~

Is this explanation still relevant now you are doing pub-side delays?

======
doc/src/sgml/catalogs.sgml

4.
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>subminsenddelay</structfield> <type>int4</type>
+ </para>
+ <para>
+ The minimum delay, in milliseconds, by the publisher to send changes
+ </para></entry>
+ </row>

"by the publisher to send changes" --> "by the publisher before sending changes"

======
doc/src/sgml/logical-replication.sgml

5.
+ <para>
+ A publication can delay sending changes to the subscription by specifying
+ the <literal>min_send_delay</literal> subscription parameter. See
+ <xref linkend="sql-createsubscription"/> for details.
+ </para>

~

This description seemed backwards because IIUC the PUBLICATION has
nothing to do with the delay really, the walsender is told what to do
by the SUBSCRIPTION. Anyway, this paragraph is in the "Subscriber"
section, so mentioning publications was a bit confusing.

SUGGESTION
A subscription can delay the receipt of changes by specifying the
min_send_delay subscription parameter. See ...

======
doc/src/sgml/monitoring.sgml

6.
+ <row>
+ <entry><literal>WalSenderSendDelay</literal></entry>
+ <entry>Waiting while sending changes for time-delayed logical replication
+ in the WAL sender process.</entry>
+ </row>

Should this say "Waiting before sending changes", instead of "Waiting
while sending changes"?

======
doc/src/sgml/ref/create_subscription.sgml

7.
+ <para>
+ By default, the publisher sends changes as soon as possible. This
+ parameter allows the user to delay the publisher to send changes by
+ given time period. If the value is specified without units, it is
+ taken as milliseconds. The default is zero (no delay). See
+ <xref linkend="config-setting-names-values"/> for details on the
+ available valid time units.
+ </para>

"to delay the publisher to send changes" --> "to delay changes"

~~~

8.
+ <para>
+ The delay is effective only when the initial table synchronization
+ has been finished and the publisher decides to send a particular
+ transaction downstream. The delay does not take into account the
+ overhead of time spent in transferring the transaction, which means
+ that the arrival time at the subscriber may be delayed more than the
+ given time.
+ </para>

I'm not sure about this mention about only "effective only when the
initial table synchronization has been finished"... Now that the delay
is pub-side I don't know that it is true anymore. The tablesync worker
will try to synchronize with the apply worker. IIUC during this
"synchronization" phase the apply worker might be getting delayed by
its own walsender, so therefore the tablesync might also be delayed
(due to syncing with the apply worker) won't it?

======
src/backend/commands/subscriptioncmds.c

9.
+ /*
+ * translator: the first %s is a string of the form "parameter > 0"
+ * and the second one is "option = value".
+ */
+ errmsg("%s and %s are mutually exclusive options",
+ "min_send_delay > 0", "streaming = parallel"));
+
+
}

Excessive whitespace.

======
src/backend/replication/logical/worker.c

10. ApplyWorkerMain

+ /*
+ * Time-delayed logical replication does not support tablesync
+ * workers, so only the leader apply worker can request walsenders to
+ * apply delay on the publisher side.
+ */
+ if (server_version >= 160000 && MySubscription->minsenddelay > 0)
+ options.proto.logical.min_send_delay = MySubscription->minsenddelay;

"apply delay" --> "delay"

======
src/backend/replication/pgoutput/pgoutput.c

11.
+ errno = 0;
+ parsed = strtoul(strVal(defel->arg), &endptr, 10);
+ if (errno != 0 || *endptr != '\0')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid min_send_delay")));
+
+ if (parsed > PG_INT32_MAX)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("min_send_delay \"%s\" out of range",
+ strVal(defel->arg))));

Should the validation be also checking/asserting no negative numbers,
or actually should the min_send_delay be defined as a uint32 in the
first place?

~~~

12. pgoutput_startup

@@ -501,6 +528,15 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
else
ctx->twophase_opt_given = true;

+ if (data->min_send_delay &&
+ data->protocol_version < LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("requested proto_version=%d does not support delay sending
data, need %d or higher",
+ data->protocol_version, LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)));
+ else
+ ctx->min_send_delay = data->min_send_delay;

IMO it doesn't make sense to compare this new feature with the
unrelated LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM protocol
version. I think we should define a new constant
LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM (even if it has the same
value as the LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM).

======
src/backend/replication/walsender.c

13. WalSndDelay

+ long diffms;
+ long timeout_interval_ms;

IMO some more informative name for these would make the code read better:

'diffms' --> 'remaining_wait_time_ms'
'timeout_interval_ms' --> 'timeout_sleeptime_ms'

~~~

14.
+ /* Sleep until we get reply from worker or we time out */
+ WalSndWait(WL_SOCKET_READABLE,
+ Min(timeout_interval_ms, diffms),
+ WAIT_EVENT_WALSENDER_SEND_DELAY);

Sorry, I didn't understand this comment "reply from worker"... AFAIK
here we are just sleeping, not waiting for replies from anywhere (???)

======
src/include/replication/logical.h

15.
@@ -64,6 +68,7 @@ typedef struct LogicalDecodingContext
LogicalOutputPluginWriterPrepareWrite prepare_write;
LogicalOutputPluginWriterWrite write;
LogicalOutputPluginWriterUpdateProgress update_progress;
+ LogicalOutputPluginWriterDelay delay;

~

15a.
Question: Is there some advantage to introducing another callback,
instead of just doing the delay inline?

~

15b.
Should this be a more informative member name like 'delay_send'?

~~~

16.
@@ -100,6 +105,8 @@ typedef struct LogicalDecodingContext
*/
bool twophase_opt_given;

+ int32 min_send_delay;
+

Missing comment for this new member.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2023-02-20 22:16:54 Re: Missing free_var() at end of accum_sum_final()?
Previous Message Tom Lane 2023-02-20 20:48:04 Re: Killing off removed rels properly