Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: osumi(dot)takamichi(at)fujitsu(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, vignesh21(at)gmail(dot)com, kuroda(dot)hayato(at)fujitsu(dot)com, shveta(dot)malik(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, euler(at)eulerto(dot)com, m(dot)melihmutlu(at)gmail(dot)com, andres(at)anarazel(dot)de, marcos(at)f10(dot)com(dot)br, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-01-25 06:26:54
Message-ID: 20230125.152654.489798779535073568.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 24 Jan 2023 12:19:04 +0000, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> Attached the patch v20 that has incorporated all comments so far.

Thanks! I looked thourgh the documentation part.

+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>subminapplydelay</structfield> <type>int8</type>
+ </para>
+ <para>
+ Total time spent delaying the application of changes, in milliseconds.
+ </para></entry>

I was confused becase it reads as this column shows the summarized
actual waiting time caused by min_apply_delay. IIUC actually it shows
the min_apply_delay setting for the subscription. Thus shouldn't it be
something like this?

"The minimum amount of time to delay applying changes, in milliseconds"

And it might be better to mention the corresponding subscription paramter.

+ error. If <varname>wal_receiver_status_interval</varname> is set to
+ zero, the apply worker doesn't send any feedback messages during the
+ <literal>min_apply_delay</literal> period.

I took a bit longer time to understand what this sentence means. I'd
like to suggest something like the follwoing.

"Since no status-update messages are sent while delaying, note that
wal_receiver_status_interval is the only source of keepalive messages
during that period."

+ <para>
+ A logical replication subscription can delay the application of changes by
+ specifying the <literal>min_apply_delay</literal> subscription parameter.
+ See <xref linkend="sql-createsubscription"/> for details.
+ </para>

I'm not sure "logical replication subscription" is a common term.
Doesn't just "subscription" mean the same, especially in that context?
(Note that 31.2 starts with "A subscription is the downstream..").

+ Any delay occurs only on WAL records for transaction begins after all
+ initial table synchronization has finished. The delay is calculated

There is no "transaction begin" WAL records. Maybe it is "logical
replication transaction begin message". The timestamp is of "commit
time". (I took "transaction begins" as a noun, but that might be
wrong..)

+ may reduce the actual wait time. It is also possible that the overhead
+ already exceeds the requested <literal>min_apply_delay</literal> value,
+ in which case no additional wait is necessary. If the system clocks

I'm not sure it is right to say "necessary" here. IMHO it might be
better be "in which case no delay is applied".

+ in which case no additional wait is necessary. If the system clocks
+ on publisher and subscriber are not synchronized, this may lead to
+ apply changes earlier than expected, but this is not a major issue
+ because this parameter is typically much larger than the time
+ deviations between servers. Note that if this parameter is set to a

This doesn't seem to fit our documentation. It is not our business
whether a certain amount deviation is critical or not. How about
somethig like the following?

"Note that the delay is measured between the timestamp assigned by
publisher and the system clock on subscriber. You need to manage the
system clocks to be in sync so that the delay works properly."

+ Delaying the replication can mean there is a much longer time
+ between making a change on the publisher, and that change being
+ committed on the subscriber. This can impact the performance of
+ synchronous replication. See <xref linkend="guc-synchronous-commit"/>
+ parameter.

Do we need the "can" in "Delaying the replication can mean"? If we
want to say, it might be "Delaying the replication means there can be
a much longer..."?

+ <para>
+ Create a subscription to a remote server that replicates tables in
+ the <literal>mypub</literal> publication and starts replicating immediately
+ on commit. Pre-existing data is not copied. The application of changes is
+ delayed by 4 hours.
+<programlisting>
+CREATE SUBSCRIPTION mysub
+ CONNECTION 'host=192.0.2.4 port=5432 user=foo dbname=foodb'
+ PUBLICATION mypub
+ WITH (copy_data = false, min_apply_delay = '4h');
+</programlisting></para>

I'm not sure we need this additional example. We already have two
exmaples one of which differs from the above only by actual values for
PUBLICATION and WITH clauses.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-01-25 06:27:41 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Takamichi Osumi (Fujitsu) 2023-01-25 05:53:23 RE: Time delayed LR (WAS Re: logical replication restrictions)