RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-21 07:57:57
Message-ID: TYAPR01MB5866C6BCA4D9386D9C486033F5A59@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thank you for reviewing! PSA new version.

> 1.
> If the subscription sets min_send_delay parameter, an apply worker passes the
> value to the publisher as an output plugin option. And then, the walsender will
> delay the transaction sending for given milliseconds.
>
> ~
>
> 1a.
> "an apply worker" --> "the apply worker (via walrcv_startstreaming)".
>
> ~
>
> 1b.
> "And then, the walsender" --> "The walsender"

Fixed.

> 2.
> The combination of parallel streaming mode and min_send_delay is not allowed.
> This is because in parallel streaming mode, we start applying the transaction
> stream as soon as the first change arrives without knowing the transaction's
> prepare/commit time. Always waiting for the full 'min_send_delay' period might
> include unnecessary delay.
>
> ~
>
> Is there another reason not to support this?
>
> Even if streaming + min_send_delay incurs some extra delay, is that a
> reason to reject outright the combination? What difference will the
> potential of a few extra seconds overhead make when min_send_delay is
> more likely to be far greater (e.g. minutes or hours)?

Another case I came up with is that streaming transactions are come continuously.
If there are many transactions to be streamed, the walsender must delay to send for
every transactions, for the given period. It means that arrival of transactions at
the subscriber may delay for approximately min_send_delay x # of transactions.

> 3.
> The other possibility was to apply the delay at the end of the parallel apply
> transaction but that would cause issues related to resource bloat and
> locks being
> held for a long time.
>
> ~
>
> Is this explanation still relevant now you are doing pub-side delays?

Slightly reworded. I think the problem may be occurred if we delay sending COMMIT
record for parallel applied transactions.

> doc/src/sgml/catalogs.sgml
>
> 4.
> + <row>
> + <entry role="catalog_table_entry"><para role="column_definition">
> + <structfield>subminsenddelay</structfield> <type>int4</type>
> + </para>
> + <para>
> + The minimum delay, in milliseconds, by the publisher to send changes
> + </para></entry>
> + </row>
>
> "by the publisher to send changes" --> "by the publisher before sending changes"

As Amit said[1], there is a possibility to delay after sending delay. So I changed to
"before sending COMMIT record". How do you think?

> doc/src/sgml/logical-replication.sgml
>
> 5.
> + <para>
> + A publication can delay sending changes to the subscription by specifying
> + the <literal>min_send_delay</literal> subscription parameter. See
> + <xref linkend="sql-createsubscription"/> for details.
> + </para>
>
> ~
>
> This description seemed backwards because IIUC the PUBLICATION has
> nothing to do with the delay really, the walsender is told what to do
> by the SUBSCRIPTION. Anyway, this paragraph is in the "Subscriber"
> section, so mentioning publications was a bit confusing.
>
> SUGGESTION
> A subscription can delay the receipt of changes by specifying the
> min_send_delay subscription parameter. See ...

Changed.

> doc/src/sgml/monitoring.sgml
>
> 6.
> + <row>
> + <entry><literal>WalSenderSendDelay</literal></entry>
> + <entry>Waiting while sending changes for time-delayed logical
> replication
> + in the WAL sender process.</entry>
> + </row>
>
> Should this say "Waiting before sending changes", instead of "Waiting
> while sending changes"?

Per discussion[1], I did not fix.

> doc/src/sgml/ref/create_subscription.sgml
>
> 7.
> + <para>
> + By default, the publisher sends changes as soon as possible. This
> + parameter allows the user to delay the publisher to send changes by
> + given time period. If the value is specified without units, it is
> + taken as milliseconds. The default is zero (no delay). See
> + <xref linkend="config-setting-names-values"/> for details on the
> + available valid time units.
> + </para>
>
> "to delay the publisher to send changes" --> "to delay changes"

Fixed.

> 8.
> + <para>
> + The delay is effective only when the initial table synchronization
> + has been finished and the publisher decides to send a particular
> + transaction downstream. The delay does not take into account the
> + overhead of time spent in transferring the transaction, which means
> + that the arrival time at the subscriber may be delayed more than the
> + given time.
> + </para>
>
> I'm not sure about this mention about only "effective only when the
> initial table synchronization has been finished"... Now that the delay
> is pub-side I don't know that it is true anymore. The tablesync worker
> will try to synchronize with the apply worker. IIUC during this
> "synchronization" phase the apply worker might be getting delayed by
> its own walsender, so therefore the tablesync might also be delayed
> (due to syncing with the apply worker) won't it?

I tested and checked codes. First of all, the tablesync worker request to send WALs
without min_send_delay, so changes will be sent and applied with no delays. In this meaning,
the table synchronization has not been affected by the feature. While checking,
however, there is a possibility that the state of table will be delayed to get
'readly' because the changing of status from SYNCDONE from READY is done by apply worker.
It may lead that two-phase will be delayed in getting to "enabled".
I added descriptions about it.

> src/backend/commands/subscriptioncmds.c
>
> 9.
> + /*
> + * translator: the first %s is a string of the form "parameter > 0"
> + * and the second one is "option = value".
> + */
> + errmsg("%s and %s are mutually exclusive options",
> + "min_send_delay > 0", "streaming = parallel"));
> +
> +
> }
>
> Excessive whitespace.

Adjusted.

> src/backend/replication/logical/worker.c
>
> 10. ApplyWorkerMain
>
> + /*
> + * Time-delayed logical replication does not support tablesync
> + * workers, so only the leader apply worker can request walsenders to
> + * apply delay on the publisher side.
> + */
> + if (server_version >= 160000 && MySubscription->minsenddelay > 0)
> + options.proto.logical.min_send_delay = MySubscription->minsenddelay;
>
> "apply delay" --> "delay"

Fixed.

> src/backend/replication/pgoutput/pgoutput.c
>
> 11.
> + errno = 0;
> + parsed = strtoul(strVal(defel->arg), &endptr, 10);
> + if (errno != 0 || *endptr != '\0')
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("invalid min_send_delay")));
> +
> + if (parsed > PG_INT32_MAX)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("min_send_delay \"%s\" out of range",
> + strVal(defel->arg))));
>
> Should the validation be also checking/asserting no negative numbers,
> or actually should the min_send_delay be defined as a uint32 in the
> first place?

I think you are right because min_apply_delay does not have related code.
we must consider additional possibility that user may send START_REPLICATION
by hand and it has minus value.
Fixed.

> 12. pgoutput_startup
>
> @@ -501,6 +528,15 @@ pgoutput_startup(LogicalDecodingContext *ctx,
> OutputPluginOptions *opt,
> else
> ctx->twophase_opt_given = true;
>
> + if (data->min_send_delay &&
> + data->protocol_version <
> LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("requested proto_version=%d does not support delay sending
> data, need %d or higher",
> + data->protocol_version,
> LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)));
> + else
> + ctx->min_send_delay = data->min_send_delay;
>
>
> IMO it doesn't make sense to compare this new feature with the
> unrelated LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM protocol
> version. I think we should define a new constant
> LOGICALREP_PROTO_MIN_SEND_DELAY_VERSION_NUM (even if it has the
> same
> value as the LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM).

Added.

> src/backend/replication/walsender.c
>
> 13. WalSndDelay
>
> + long diffms;
> + long timeout_interval_ms;
>
> IMO some more informative name for these would make the code read better:
>
> 'diffms' --> 'remaining_wait_time_ms'
> 'timeout_interval_ms' --> 'timeout_sleeptime_ms'

Changed.

> 14.
> + /* Sleep until we get reply from worker or we time out */
> + WalSndWait(WL_SOCKET_READABLE,
> + Min(timeout_interval_ms, diffms),
> + WAIT_EVENT_WALSENDER_SEND_DELAY);
>
> Sorry, I didn't understand this comment "reply from worker"... AFAIK
> here we are just sleeping, not waiting for replies from anywhere (???)
>
> ======
> src/include/replication/logical.h
>
> 15.
> @@ -64,6 +68,7 @@ typedef struct LogicalDecodingContext
> LogicalOutputPluginWriterPrepareWrite prepare_write;
> LogicalOutputPluginWriterWrite write;
> LogicalOutputPluginWriterUpdateProgress update_progress;
> + LogicalOutputPluginWriterDelay delay;
>
> ~
>
> 15a.
> Question: Is there some advantage to introducing another callback,
> instead of just doing the delay inline?

IIUC functions related with walsender should not be called directly, because there
is a possibility that replication slots are manipulated from the backed.

> 15b.
> Should this be a more informative member name like 'delay_send'?

Changed.

> 16.
> @@ -100,6 +105,8 @@ typedef struct LogicalDecodingContext
> */
> bool twophase_opt_given;
>
> + int32 min_send_delay;
> +
>
> Missing comment for this new member.

Added.

[1]: https://www.postgresql.org/message-id/CAA4eK1+JwLAVAOphnZ1YTiEV_jOMAE6JgJmBE98oek2cg7XF0w@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v4-0001-Time-delayed-logical-replication-on-publisher-sid.patch application/octet-stream 82.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-02-21 07:58:58 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Jim Jones 2023-02-21 07:23:26 Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist