RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-02 08:18:49
Message-ID: TYCPR01MB8373542038DA5A274D6CE115EDD69@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wednesday, February 1, 2023 1:37 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are my review comments for the patch v25-0001.
Thank you for your review !

> ======
> Commit Message
>
> 1.
> The other possibility is to apply the delay at the end of the parallel apply
> transaction but that would cause issues related to resource bloat and locks being
> held for a long time.
>
> ~
>
> SUGGESTION
> We chose not to apply the delay at the end of the parallel apply transaction
> because that would cause issues related to resource bloat and locks being held
> for a long time.
I prefer the current description. So, I just changed one word
from "The other possibility is..." to "The other possibility was"
to indicate both two paragraphs (this paragraph and the previous paragraph)
are related.

> ======
> doc/src/sgml/config.sgml
>
> 2.
> + <para>
> + For time-delayed logical replication, the apply worker sends a feedback
> + message to the publisher every
> + <varname>wal_receiver_status_interval</varname> milliseconds.
> Make sure
> + to set <varname>wal_receiver_status_interval</varname> less than
> the
> + <varname>wal_sender_timeout</varname> on the publisher,
> otherwise, the
> + <literal>walsender</literal> will repeatedly terminate due to timeout
> + error. Note that if <varname>wal_receiver_status_interval</varname>
> is
> + set to zero, the apply worker sends no feedback messages during the
> + <literal>min_apply_delay</literal> period.
> + </para>
>
> 2a.
> "due to timeout error." --> "due to timeout errors."
Fixed.

> ~
>
> 2b.
> Shouldn't this also cross-ref to CREATE SUBSCRIPTION docs? Because the
> above mentions 'min_apply_delay' but that is not defined on this page.
Makes sense. Added.

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 3.
> + <para>
> + By default, the subscriber applies changes as soon as possible. This
> + parameter allows the user to delay the application of changes by a
> + given time period. If the value is specified without units, it is
> + taken as milliseconds. The default is zero (no delay). See
> + <xref linkend="config-setting-names-values"/> for details on the
> + available valid time unites.
> + </para>
>
> Typo: "unites"
Fixed it to "units".

> ~~~
>
> 4.
> + <para>
> + Any delay becomes effective after all initial table synchronization
> + has finished and occurs before each transaction starts to get applied
> + on the subscriber. The delay is calculated as the difference between
> + the WAL timestamp as written on the publisher and the current time
> on
> + the subscriber. Any overhead of time spent in logical decoding and in
> + transferring the transaction may reduce the actual wait time. It is
> + also possible that the overhead already exceeds the requested
> + <literal>min_apply_delay</literal> value, in which case no delay is
> + applied. If the system clocks on publisher and subscriber are not
> + synchronized, this may lead to apply changes earlier than expected,
> + but this is not a major issue because this parameter is typically
> + much larger than the time deviations between servers. Note that if
> + this parameter is set to a long delay, the replication will stop if
> + the replication slot falls behind the current LSN by more than
> + <link
> linkend="guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</liter
> al></link>.
> + </para>
>
> "Any delay becomes effective after all initial table synchronization..." --> "Any
> delay becomes effective only after all initial table synchronization..."
Agreed. Fixed.

> ~~~
>
> 5.
> + <warning>
> + <para>
> + Delaying the replication means there is a much longer time
> between
> + making a change on the publisher, and that change being
> committed
> + on the subscriber. This can impact the performance of
> synchronous
> + replication. See <xref linkend="guc-synchronous-commit"/>
> + parameter.
> + </para>
> + </warning>
>
>
> I'm not sure why this was text changed to say "means there is a much longer
> time" instead of "can mean there is a much longer time".
>
> IMO the previous wording was better because this current text makes an
> assumption about what the user has configured -- e.g. if they configured only
> 1ms delay then the warning text is not really relevant.
Yes, I changed here. The reason is that the purpose of this feature
is to address unintentional wrong operations on the pub and for that purpose,
I didn't feel quite very short time like you mentioned might not be set for this parameter
after some community's comments from hackers. Either was fine,
but I chose the current description, depending on the purpose.

> ~~~
>
> 6.
> Why was the example (it existed when I last looked at patch v19) removed?
> Personally, I found that example to be a useful reminder that the
> min_apply_delay can specify units other than just 'ms'.
Removed because the example was one variation that used one difference value of
WITH clause, after some comments from the hackers.
The reference for available units is documented,
so the current description should be sufficient.

> ======
> src/backend/commands/subscriptioncmds.c
>
> 7. parse_subscription_options
>
> + /*
> + * The combination of parallel streaming mode and min_apply_delay is
> + not
> + * allowed. This is because we start applying the transaction stream as
> + * soon as the first change arrives without knowing the transaction's
> + * prepare/commit time. This means we cannot calculate the underlying
> + * network/decoding lag between publisher and subscriber, and so always
> + * waiting for the full 'min_apply_delay' period might include
> + unnecessary
> + * delay.
> + *
> + * The other possibility is to apply the delay at the end of the
> + parallel
> + * apply transaction but that would cause issues related to resource
> + bloat
> + * and locks being held for a long time.
> + */
>
> I think the 2nd paragraph should be changed slightly as follows (like review
> comment #1)
>
> SUGGESTION
> Note - we chose not to apply the delay at the end of the parallel apply
> transaction because that would cause issues related to resource bloat and locks
> being held for a long time.
Same as the first comment, changed only "is" to "was",
to indicate the last paragraph is related to past discussion(option)
for the parallel streaming mode that was not adopted.

> ~~~
>
> 8.
> + if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> + opts->min_apply_delay > 0 && opts->streaming ==
> + opts->LOGICALREP_STREAM_PARALLEL)
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
>
> Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0.
This check is necessary.

For example, imagine a case when we CREATE a subscription with streaming = on
and then try to ALTER the subscription with streaming = parallel
without any settings for min_apply_delay. The ALTER command
throws an error of "min_apply_delay > 0 and streaming = parallel are
mutually exclusive options." then.

This is because min_apply_delay is supported by ALTER command
(so the first condition becomes true) and we set
streaming = parallel (which makes the 2nd condition true).

So, we need to check the opts's actual min_apply_delay value
to make the irrelavent case pass.
> ~~~
>
> 9. AlterSubscription
>
> + /*
> + * The combination of parallel streaming mode and
> + * min_apply_delay is not allowed. See
> + * parse_subscription_options for details of the reason.
> + */
> + if (opts.streaming == LOGICALREP_STREAM_PARALLEL) if
> + ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) &&
> opts.min_apply_delay > 0) ||
> + (!IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) &&
> sub->minapplydelay > 0))
>
> Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0.
This is also necessary.

For example, imagine a case that
there is a subscription whose min_apply_delay is 1 day.
Then, you want to try to execute ALTER SUBSCRIPTION
with (min_apply_delay = 0, streaming = parallel).
If we remove the condition of otps.min_apply_delay > 0,
then we error out in this case too.

First we pass the first condition
of the opts.streaming == LOGICALREP_STREAM_PARALLEL,
since we use streaming option.
Then, we also set min_apply_delay in this example,
then without checking the value of min_apply_delay,
the second condition becomes true
(IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)).

So, we need to make this case(min_apply_delay = 0) pass.
Meanwhile, checking the "sub" value is necessary for checking existing subscription value.
> ~~~
>
> 10.
> + if (IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)) {
> + /*
> + * The combination of parallel streaming mode and
> + * min_apply_delay is not allowed.
> + */
> + if (opts.min_apply_delay > 0)
>
> Saying "> 0" (in the condition) is not strictly necessary here, since it is never < 0.
This is also required to check the value equals to 0 or not.
Kindly imagine a case when we want to execute ALTER min_apply_delay from 1day
with a pair of (min_apply_delay = 0 and
streaming = parallel). If we remove this check, then this ALTER command fails
with error. Without the check, when we set min_apply_delay
and parallel streaming mode, even when making the min_apply_delay 0,
the error is invoked.

The check for sub.stream is necessary for existing definition of target subscription.
> ~~~
>
> 11. defGetMinApplyDelay
>
> + /*
> + * Check lower bound. parse_int() has already been confirmed that
> + result
> + * is less than or equal to INT_MAX.
> + */
>
> The parse_int already checks < INT_MAX. But on return from that function,
> don’t you need to check again that it is < PG_INT32_MAX (in case those are
> different)
>
> (I think Kuroda-san already suggested same as this)
Changed according to the discussion.

> ======
> src/backend/replication/logical/worker.c
>
> 12.
> +/*
> + * In order to avoid walsender timeout for time-delayed logical
> +replication the
> + * apply worker keeps sending feedback messages during the delay period.
> + * Meanwhile, the feature delays the apply before the start of the
> + * transaction and thus we don't write WAL records for the suspended
> +changes
> + * during the wait. When the apply worker sends a feedback message
> +during the
> + * delay, we should not make positions of the flushed and apply LSN
> +overwritten
> + * by the last received latest LSN. See send_feedback() for details.
> + */
>
> "we should not make positions of the flushed and apply LSN overwritten" -->
> "we should overwrite positions of the flushed and apply LSN"
Fixed. I added "not" in your suggestion, too.

> ~~~
>
> 14. send_feedback
>
> @@ -3738,8 +3867,15 @@ send_feedback(XLogRecPtr recvpos, bool force, bool
> requestReply)
> /*
> * No outstanding transactions to flush, we can report the latest received
> * position. This is important for synchronous replication.
> + *
> + * If the logical replication subscription has unprocessed changes then
> + do
> + * not inform the publisher that the received latest LSN is already
> + * applied and flushed, otherwise, the publisher will make a wrong
> + * assumption about the logical replication progress. Instead, it just
> + * sends a feedback message to avoid a replication timeout during the
> + * delay.
> */
>
> "Instead, it just sends" --> "Instead, just send"
Fixed.

> ======
> src/bin/pg_dump/pg_dump.h
>
> 15. SubscriptionInfo
>
> @@ -661,6 +661,7 @@ typedef struct _SubscriptionInfo
> char *subdisableonerr;
> char *suborigin;
> char *subsynccommit;
> + int subminapplydelay;
> char *subpublications;
> } SubscriptionInfo;
>
> Should this also be "int32" to match the other member type changes?
This is intentional.
In the context of pg_dump, we are treating
this same as other int32 catalog members.
So, I'd like to keep the current code.

> ======
> src/test/subscription/t/032_apply_delay.pl
>
> 16.
> +# Make sure the apply worker knows to wait for more than 500ms
> +check_apply_delay_log($node_subscriber, $offset, "0.5");
>
> "knows to wait for more than" --> "waits for more than"
>
> (this occurs in a couple of places)
Fixed.

Kindly have a look at v26 shared in [1].

[1] - https://www.postgresql.org/message-id/TYCPR01MB83730A45925B9680C40D92AFEDD69%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takamichi Osumi (Fujitsu) 2023-02-02 08:21:18 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Takamichi Osumi (Fujitsu) 2023-02-02 08:03:55 RE: Time delayed LR (WAS Re: logical replication restrictions)