Re: logical replication restrictions

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Peter Smith" <smithpb2250(at)gmail(dot)com>
Cc: "Andres Freund" <andres(at)anarazel(dot)de>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication restrictions
Date: 2022-08-01 12:07:47
Message-ID: acfc1946-a73e-4e9d-86b3-b19cba225a41@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 5, 2022, at 5:41 AM, Peter Smith wrote:
> Here are some review comments for your v4-0001 patch. I hope they are
> useful for you.
Thanks for your review.

> This thread name "logical replication restrictions" seems quite
> unrelated to the patch here. Maybe it's better to start a new thread
> otherwise nobody is going to recognise what this thread is really
> about.
I agree that the $SUBJECT does not describe the proposal. I decided that it is
not worth creating a thread because (i) there are some interaction and they
could be monitoring this thread and (ii) the CF entry has the correct
description.

> Similar to physical replication, a time-delayed copy of the data for
> logical replication is useful for some scenarios (specially to fix
> errors that might cause data loss).
I changed the commit message a bit.

> Maybe take some examples from the regression tests to show usage of
> the new parameter
I don't think an example is really useful in a commit message. If you are
checking this commit, it is a matter of reading the regression tests or
documentation to obtain an example of how to use it.

> I think this should say that the units are ms.
Unit included.

> Is the "integer" type here correct? It might eventually be stored as
> an integer, but IIUC (going by the tests) from the user point-of-view
> this parameter is really "text" type for representing ms or interval,
> right?
The internal representation is integer. The unit is correct. If you use units,
the format is text that what the section [1] calls "Numeric with Unit". Even
if the user is unsure about its usage, an example might help here.

> SUGGESTION
> As with the physical replication feature (recovery_min_apply_delay),
> it can be useful for logical replication to delay the data
> replication.
It is not "data replication", it is applying changes. I reworded that sentence.

> SUGGESTION
> Time spent in logical decoding and in transferring the transaction may
> reduce the actual wait time.
Changed.

> If the system clocks on publisher and subscriber are not
> + synchronized, this may lead to apply changes earlier than expected.
>
> Why just say "earlier than expected"? If the publisher's time is ahead
> of the subscriber then the changes might also be *later* than
> expected, right? So, perhaps it is better to just say "other than
> expected".
This sentence is similar to another one in the recovery_min_apply_delay. I want
to emphasize the fact that even if you use a 30-minute delay, it might apply a
change that happened 29 minutes 55 seconds ago. The main reason for this
feature is to avoid modifying changes *earlier*. If it applies the change 30
minutes 5 seconds, it is fine.

> Should there also be a big warning box about the impact if using
> synchronous_commit (like the other streaming replication page has this
> warning)?
Impact? Could you elaborate?

> I think there should be some examples somewhere showing how to specify
> this parameter. Maybe they are better added somewhere in "31.2
> Subscription" and xrefed from here.
I added one example in the CREATE SUBSCRIPTION. We can add an example in the
section 31.2, however, since it is a new chapter I think it lacks examples for
the other options too (streaming, two_phase, copy_data, ...). It could be
submitted as a separate patch IMO.

> I think there should be a default assignment to 0 (done where all the
> other supported option defaults are set)
It could for completeness. the memset() takes care of it. Anyway, I added it to
the beginning of the parse_subscription_options().

> + if (opts->min_apply_delay < 0)
> + ereport(ERROR,
> + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> + errmsg("option \"%s\" must not be negative", "min_apply_delay"));
> +
>
> I thought this check only needs to be do within scope of the preceding
> if - (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> strcmp(defel->defname, "min_apply_delay") == 0)
Fixed.

> + /*
> + * If this subscription has been disabled and it has an apply
> + * delay set, wake up the logical replication worker to finish
> + * it as soon as possible.
> + */
> + if (!opts.enabled && sub->applydelay > 0)
>
> I did not really understand the logic why should the min_apply_delay
> override the enabled=false? It is a called *minimum* delay so if it
> ends up being way over the parameter value (because the subscription
> is disabled) then why does that matter?
It doesn't. The main point of this code (as I tried to explain in the comment)
is to kill the worker as soon as possible if you disable the subscription.
Isn't the comment clear?

> Subscription *MySubscription = NULL;
> static bool MySubscriptionValid = false;
> +TimestampTz MySubscriptionMinApplyDelayUntil = 0;
>
> Looking at the only usage of this variable (in apply_delay) and how it
> is used I did see why this cannot just be a local member of the
> apply_delay function?
Good catch. A previous patch used that variable outside that function scope.

> +/*
> + * Apply the informed delay for the transaction.
> + *
> + * A regular transaction uses the commit time to calculate the delay. A
> + * prepared transaction uses the prepare time to calculate the delay.
> + */
> +static void
> +apply_delay(TimestampTz ts)
>
> I didn't think it needs to mention here about the different kinds of
> transactions because where it comes from has nothing really to do with
> this function's logic.
Fixed.

> Refer to comment #14 about MySubscriptionMinApplyDelayUntil.
Fixed.

> It seems to rely on the spooling happening at the end. But won't this
> cause a problem later when/if the "parallel apply" patch [1] is pushed
> and the stream bgworkers are doing stuff on the fly instead of
> spooling at the end?
>
> Or are you expecting that the "parallel apply" feature should be
> disabled if there is any min_apply_delay parameter specified?
I didn't read the "parallel apply" patch yet.

> Let's keep the alphabetical order of the parameters in COMPLETE_WITH, as per [2]
Fixed.

> + int64 subapplydelay; /* Replication apply delay */
> +
>
> IMO the comment should mention the units "(ms)"
I'm not sure. It should be documented in the catalogs. It is an important
information for user-visible interface. There are a few places in the
documentation that the unit is mentioned.

> There are some test cases for CREATE SUBSCRIPTION but there are no
> test cases for ALTER SUBSCRIPTION changing this new parameter.
I added a test to cover ALTER SUBSCRIPTION and also for the disabling a
subscription that contains a delay set.

> I received the following error when trying to run these 'subscription' tests:
Fixed.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
v5-0001-Time-delayed-logical-replication-subscriber.patch text/x-patch 63.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-08-01 12:38:05 Re: NAMEDATALEN increase because of non-latin languages
Previous Message torikoshia 2022-08-01 12:05:45 Re: Should fix a comment referring to stats collector?