Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-08 07:21:11
Message-ID: CAHut+PtXmsRamBH9F7RVkcBzPaetKtcxCAcpoh6RkJOy=AAHWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for v31-0001

======
doc/src/sgml/glossary.sgml

1.
+ <para>
+ Replication setup that applies time-delayed copy of the data.
+ </para>

That sentence seemed a bit strange to me.

SUGGESTION
Replication setup that delays the application of changes by a
specified minimum time-delay period.

======

src/backend/replication/logical/worker.c

2. maybe_apply_delay

+ if (wal_receiver_status_interval > 0 &&
+ diffms > wal_receiver_status_interval * 1000L)
+ {
+ WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ wal_receiver_status_interval * 1000L,
+ WAIT_EVENT_RECOVERY_APPLY_DELAY);
+ send_feedback(last_received, true, false, true);
+ }

I felt that introducing another variable like:

long statusinterval_ms = wal_receiver_status_interval * 1000L;

would help here by doing 2 things:
1) The condition would be easier to read because the ms units would be the same
2) Won't need * 1000L repeated in two places.

Only, do take care to assign this variable in the right place in this
loop in case the configuration is changed.

======
src/test/subscription/t/001_rep_changes.pl

3.
+# Test time-delayed logical replication
+#
+# If the subscription sets min_apply_delay parameter, the logical replication
+# worker will delay the transaction apply for min_apply_delay milliseconds. We
+# look the time duration between tuples are inserted on publisher and then
+# changes are replicated on subscriber.

This comment and the other one appearing later in this test are both
explaining the same test strategy. I think both comments should be
combined into one big one up-front, like this:

SUGGESTION
If the subscription sets min_apply_delay parameter, the logical
replication worker will delay the transaction apply for
min_apply_delay milliseconds. We verify this by looking at the time
difference between a) when tuples are inserted on the publisher, and
b) when those changes are replicated on the subscriber. Even on slow
machines, this strategy will give predictable behavior.

~~

4.
+my $delay = 3;
+
+# Set min_apply_delay parameter to 3 seconds
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')");

IMO that "my $delay = 3;" assignment should be *after* the comment:

e.g.
+
+# Set min_apply_delay parameter to 3 seconds
+my $delay = 3;
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub_renamed SET (min_apply_delay = '${delay}s')");

~~~

5.
+# Make new content on publisher and check its presence in subscriber depending
+# on the delay applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on publisher and replay time on
subscriber.

Most of this comment is now redundant because this was already
explained in the big comment up-front (see #3). Only one useful
sentence is left.

SUGGESTION
Before doing the insertion, get the current timestamp that will be
used as a comparison base.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-02-08 07:23:34 Re: recovery modules
Previous Message Michael Paquier 2023-02-08 07:16:29 Re: Generating code for query jumbling through gen_node_support.pl