Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-06 08:00:07
Message-ID: CAHut+PsCF_a4Mpcdo5bcE-4YuYqzR2Kzj_v=3CFROTP2vnPECA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v9-0001:

======

GENERAL

1. min_ prefix?

What's the significance of the "min_" prefix for this parameter? I'm
guessing the background is that at one time it was considered to be a
GUC so took a name similar to GUC recovery_min_apply_delay (??)

But in practice, I think it is meaningless and/or misleading. For
example, suppose the user wants to defer replication by 1hr. IMO it is
more natural to just say "defer replication by 1 hr" (aka
apply_delay='1hr') Clearly it means replication will take place about
1 hr into the future. OTHO saying "defer replication by a MINIMUM of 1
hr" (aka min_apply_delay='1hr') is quite vague because then it is
equally valid if the replication gets delayed by 1 hr or 2 hrs or 5
days or 3 weeks since all of those satisfy the minimum delay. The
implementation could hardwire a delay of INT_MAX ms but clearly,
that's not really what the user would expect.

~

So, I think this parameter should be renamed just as 'apply_delay'.

But, if you still decide to keep it as 'min_apply_delay' then there is
a lot of other code that ought to be changed to be consistent with
that name.
e.g.
- subapplydelay in catalogs.sgml --> subminapplydelay
- subapplydelay in system_views.sql --> subminapplydelay
- subapplydelay in pg_subscription.h --> subminapplydelay
- subapplydelay in dump.h --> subminapplydelay
- i_subapplydelay in pg_dump.c --> i_subminapplydelay
- applydelay member name of Form_pg_subscription --> minapplydelay
- "Apply Delay" for the column name displayed by describe.c --> "Min
apply delay"
- more...

(IMO the fact that so much code does not currently say 'min' at all is
just evidence that the 'min' prefix really didn't really mean much in
the first place)

======

doc/src/sgml/catalogs.sgml

2. Section 31.2 Subscription

+ <para>
+ Time delayed replica of subscription is available by indicating
+ <literal>min_apply_delay</literal>. See
+ <xref linkend="sql-createsubscription"/> for details.
+ </para>

How about saying like:

SUGGESTION
The subscriber replication can be instructed to lag behind the
publisher side changes by specifying the
<literal>min_apply_delay</literal> subscription parameter. See XXX for
details.

======

doc/src/sgml/ref/create_subscription.sgml

3. min_apply_delay

+ <para>
+ By default, subscriber applies changes as soon as possible. As with
+ the physical replication feature
+ (<xref linkend="guc-recovery-min-apply-delay"/>), it can be useful to
+ have a time-delayed logical replica. This parameter allows you to
+ delay the application of changes by a specified amount of time. If
+ this value is specified without units, it is taken as milliseconds.
+ The default is zero, adding no delay.
+ </para>

"subscriber applies" -> "the subscriber applies"

"allows you" -> "lets the user"

"The default is zero, adding no delay." -> "The default is zero (no delay)."

~

4.

+ larger than the time deviations between servers. Note that
+ in the case when this parameter is set to a long value, the
+ replication may not continue if the replication slot falls behind the
+ current LSN by more than <literal>max_slot_wal_keep_size</literal>.
+ See more details in <xref linkend="guc-max-slot-wal-keep-size"/>.
+ </para>

4a.
SUGGESTION
Note that if this parameter is set to a long delay, the replication
will stop if the replication slot falls behind the current LSN by more
than <literal>max_slot_wal_keep_size</literal>.

~

4b.
When it is rendered (like below) it looks a bit repetitive:
... if the replication slot falls behind the current LSN by more than
max_slot_wal_keep_size. See more details in max_slot_wal_keep_size.

~

IMO the previous sentence should include the link.

SUGGESTION
if the replication slot falls behind the current LSN by more than
<link linkend =
"guc-max-slot-wal-keep-size"><literal>max_slot_wal_keep_size</literal></link>.

~

5.

+ <para>
+ Synchronous replication is affected by this setting when
+ <varname>synchronous_commit</varname> is set to
+ <literal>remote_write</literal>; every <literal>COMMIT</literal>
+ will need to wait to be applied.
+ </para>

Yes, this deserves a big warning -- but I am just not quite sure of
the details. I think this impacts more than just "remote_rewrite" --
e.g. the same problem would happen if "synchronous_standby_names" is
non-empty.

I think this warning needs to be more generic to cover everything.
Maybe something like below

SUGGESTION:
Delaying the replication can mean there is a much longer time between
making a change on the publisher, and that change being committed on
the subscriber. This can have a big impact on synchronous replication.
See https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

======

src/backend/commands/subscriptioncmds.c

6. parse_subscription_options

+ ms = interval_to_ms(interval);
+ if (ms < 0 || ms > INT_MAX)
+ ereport(ERROR,
+ errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+ errmsg("%lld ms is outside the valid range for option \"%s\"",
+ (long long) ms, "min_apply_delay"));

"for option" -> "for parameter"

======

src/backend/replication/logical/worker.c

7. apply_delay

+static void
+apply_delay(TimestampTz ts)

IMO having a delay is not the usual case. So, would a better name for
this function be 'maybe_delay'?

~

8.

+ * high value for the delay. This design is different from the physical
+ * replication (that applies the delay at commit time) mainly because write
+ * operations may allow some issues (such as bloat and locks) that can be
+ * minimized if it does not keep the transaction open for such a long time.

Something seems not quite right with this wording -- is there a better
way of describing this?

~

9.

+ /*
+ * Delay apply until all tablesync workers have reached READY state. If we
+ * allow the delay during the catchup phase, once we reach the limit of
+ * tablesync workers, it will impose a delay for each subsequent worker.
+ * It means it will take a long time to finish the initial table
+ * synchronization.
+ */
+ if (!AllTablesyncsReady())
+ return;

"Delay apply until..." -> "The min_apply_delay parameter is ignored until..."

~

10.

+ /*
+ * The worker may be waken because of the ALTER SUBSCRIPTION ...
+ * DISABLE, so the catalog pg_subscription should be read again.
+ */
+ if (!in_remote_transaction && !in_streamed_transaction)
+ {
+ AcceptInvalidationMessages();
+ maybe_reread_subscription();
+ }
+ }

"waken" -> "woken"

======

src/bin/psql/describe.c

11. describeSubscriptions

+ /* Origin and min_apply_delay are only supported in v16 and higher */
if (pset.sversion >= 160000)
appendPQExpBuffer(&buf,
- ", suborigin AS \"%s\"\n",
- gettext_noop("Origin"));
+ ", suborigin AS \"%s\"\n"
+ ", subapplydelay AS \"%s\"\n",
+ gettext_noop("Origin"),
+ gettext_noop("Apply delay"));

IIUC the psql command is supposed to display useful information to the
user, so I wondered if it is worthwhile to put the units in this
column header -- "Apply delay (ms)" instead of just "Apply delay"
because that would make it far easier to understand the meaning
without having to check the documentation to discover the units.

======

src/include/utils/timestamp.h

12.

+extern int64 interval_to_ms(const Interval *interval);
+

For consistency with the other interval conversion functions exposed
here maybe this one should have been called 'interval2ms'

======

src/test/subscription/t/032_apply_delay.pl

13.

IIUC this test is checking if a delay has occurred by inspecting the
debug logs to see if a certain code path including "logical
replication apply delay" is logged. I guess that is OK, but another
way might be to compare the actual timing values of the published and
replicated rows.

The publisher table can have a column with default now() and the
subscriber side table can have an *additional* column also with
default now(). After replication, those two timestamp values can be
compared to check if the difference exceeds the min_time_delay
parameter specified.

------

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-12-06 08:01:22 Re: Generate pg_stat_get_* functions with Macros
Previous Message Amit Kapila 2022-12-06 07:50:24 Re: Perform streaming logical transactions by background workers and parallel apply