RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-01-24 14:57:22
Message-ID: TYCPR01MB837391239FF0BA673697D21EEDC99@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, January 24, 2023 8:32 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> Good to know that you keep improving this patch. I have a few suggestions that
> were easier to provide a patch on top of your latest patch than to provide an
> inline suggestions.
Thanks for your review ! We basically adopted your suggestions.

> There are a few documentation polishing. Let me comment some of them above.
>
> - The length of time (ms) to delay the application of changes.
> + Total time spent delaying the application of changes, in milliseconds
>
> I don't remember if I suggested this description for catalog but IMO the
> suggestion reads better for me.
Adopted the above change.

> - For time-delayed logical replication (i.e. when the subscription is
> - created with parameter min_apply_delay > 0), the apply worker sends a
> - Standby Status Update message to the publisher with a period of
> - <literal>wal_receiver_status_interval</literal>. Make sure to set
> - <literal>wal_receiver_status_interval</literal> less than the
> - <literal>wal_sender_timeout</literal> on the publisher, otherwise, the
> - walsender will repeatedly terminate due to the timeout errors. If
> - <literal>wal_receiver_status_interval</literal> is set to zero, the apply
> - worker doesn't send any feedback messages during the subscriber's
> - <literal>min_apply_delay</literal> period. See
> - <xref linkend="sql-createsubscription"/> for details.
> + For time-delayed logical replication, the apply worker sends a feedback
> + message to the publisher every
> + <varname>wal_receiver_status_interval</varname> milliseconds. Make sure
> + to set <varname>wal_receiver_status_interval</varname> less than the
> + <varname>wal_sender_timeout</varname> on the publisher, otherwise, the
> + <literal>walsender</literal> will repeatedly terminate due to timeout
> + error. If <varname>wal_receiver_status_interval</varname> is set to
> + zero, the apply worker doesn't send any feedback messages during the
> + <literal>min_apply_delay</literal> interval.
>
> I removed the parenthesis explanation about time-delayed logical replication.
> If you are reading the documentation and does not know what it means you should
> (a) read the logical replication chapter or (b) check the glossary (maybe a new
> entry should be added). I also removed the Standby status Update message but it
> is a low level detail; let's refer to it as feedback message as the other
> sentences do. I changed "literal" to "varname" that's the correct tag for
> parameters. I replace "period" with "interval" that was the previous
> terminology. IMO we should be uniform, use one or the other.
Adopted.

Also, I added the glossary for time-delayed replication (one for
applicable to both physical replication and logical replication).
Plus, I united the term "interval" to period, because it would clarify the type for this feature.
I think this is better.
> - The subscriber replication can be instructed to lag behind the publisher
> - side changes by specifying the <literal>min_apply_delay</literal>
> - subscription parameter. See <xref linkend="sql-createsubscription"/> for
> - details.
> + A logical replication subscription can delay the application of changes by
> + specifying the <literal>min_apply_delay</literal> subscription parameter.
> + See <xref linkend="sql-createsubscription"/> for details.
>
> This feature refers to a specific subscription, hence, "logical replication
> subscription" instead of "subscriber replication".
Adopted.

> + if (IsSet(opts->specified_opts, SUBOPT_MIN_APPLY_DELAY))
> + errorConflictingDefElem(defel, pstate);
> +
>
> Peter S referred to this missing piece of code too.
Added.

> -int
> +static int
> defGetMinApplyDelay(DefElem *def)
> {
>
> It seems you forgot static keyword.
Fixed.

> - elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay = %lld ms, Remaining wait time: %ld ms",
> - xid, (long long) MySubscription->minapplydelay, diffms);
> + elog(DEBUG2, "time-delayed replication for txid %u, min_apply_delay = " INT64_FORMAT " ms, remaining wait time: %ld ms",
> + xid, MySubscription->minapplydelay, diffms);
> int64 should use format modifier INT64_FORMAT.
Fixed.

> - (long) wal_receiver_status_interval * 1000,
> + wal_receiver_status_interval * 1000L,
>
> Cast is not required. I added a suffix to the constant.
Fixed.

> - elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X in-delayed: %d",
> + elog(DEBUG2, "sending feedback (force %d) to recv %X/%X, write %X/%X, flush %X/%X, apply delay: %s",
> force,
> LSN_FORMAT_ARGS(recvpos),
> LSN_FORMAT_ARGS(writepos),
> LSN_FORMAT_ARGS(flushpos),
> - in_delayed_apply);
> + in_delayed_apply? "yes" : "no");
>
> It is better to use a string to represent the yes/no option.
Fixed.

> - gettext_noop("Min apply delay (ms)"));
> + gettext_noop("Min apply delay"));
>
> I don't know if it was discussed but we don't add units to headers. When I
> think about this parameter representation (internal and external), I decided to
> use the previous code because it provides a unit for external representation. I
> understand that using the same representation as recovery_min_apply_delay is
> good but the current code does not handle the external representation
> accordingly. (recovery_min_apply_delay uses the GUC machinery to adds the unit
> but for min_apply_delay, it doesn't).
Adopted.

> # Setup for streaming case
> -$node_publisher->append_conf('postgres.conf',
> +$node_publisher->append_conf('postgresql.conf',
> 'logical_decoding_mode = immediate');
> $node_publisher->reload;
>
> Fix configuration file name.
Fixed.

> Maybe tests should do a better job. I think check_apply_delay_time is fragile
> because it does not guarantee that time is not shifted. Time-delayed
> replication is a subscriber feature and to check its correctness it should
> check the logs.
>
> # Note that we cannot call check_apply_delay_log() here because there is a
> # possibility that the delay is skipped. The event happens when the WAL
> # replication between publisher and subscriber is delayed due to a mechanical
> # problem. The log output will be checked later - substantial delay-time case.
>
> If you might not use the logs for it, it should adjust the min_apply_delay, no?
Yes. Adjusted.

> It does not exercise the min_apply_delay vs parallel streaming mode.
>
> + /*
> + * The combination of parallel streaming mode and
> + * min_apply_delay is not allowed.
> + */
> + if (opts.streaming == LOGICALREP_STREAM_PARALLEL)
> + if ((IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && opts.min_apply_delay > 0) ||
> + (!IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY) && sub->minapplydelay > 0))
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot enable %s mode for subscription with %s",
> + "streaming = parallel", "min_apply_delay"));
> +
>
> Is this code correct? I also didn't like this message. "cannot enable streaming
> = parallel mode for subscription with min_apply_delay" is far from a good error
> message. How about refer parallelism to "parallel streaming mode".
Yes. opts is the input for alter command and sub object is the existing definition.
We need to check those combinations like when streaming is set to parallel
and min_apply_delay also gets set, then, min_apply_delay should not be bigger than 0, for example.
Besides, adopted your suggestion to improve the comments.

Attach the patch in [1]. Kindly have a look at it.

[1] - https://www.postgresql.org/message-id/TYCPR01MB8373DC1881F382B4703F26E0EDC99%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-01-24 14:59:06 Re: Minimal logical decoding on standbys
Previous Message Tom Lane 2023-01-24 14:54:57 Re: run pgindent on a regular basis / scripted manner