Re: logical replication restrictions

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Melih Mutlu" <m(dot)melihmutlu(at)gmail(dot)com>
Cc: "Andres Freund" <andres(at)anarazel(dot)de>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Marcos Pegoraro" <marcos(at)f10(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Peter Smith" <smithpb2250(at)gmail(dot)com>
Subject: Re: logical replication restrictions
Date: 2022-08-08 21:46:56
Message-ID: 85ee6c87-ca0b-4330-9a7a-8a6cd389f7e0@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 13, 2022, at 2:34 PM, Melih Mutlu wrote:

[Sorry for the delay...]

> 22. src/test/subscription/t/032_apply_delay.pl
>>
>> I received the following error when trying to run these 'subscription' tests:
>>
>> t/032_apply_delay.pl ............... No such class log_location at
>> t/032_apply_delay.pl line 49, near "my log_location"
>> syntax error at t/032_apply_delay.pl line 49, near "my log_location ="
>
> I'm having these errors too. Seems like some declarations are missing.
Fixed in v5.

>
>>> + specified amount of time. If this value is specified without units,
>>> + it is taken as milliseconds. The default is zero, adding no delay.
>>> + </para>
> I'm also having an issue when I give min_apply_delay parameter without units.
> I expect that if I set min_apply_delay to 5000 (without any unit), it will be interpreted as 5000 ms.
Good catch. I fixed it in v5.

>
> Lastly, I have a question about this delay during tablesync.
> It's stated here that apply delays are not for initial tablesync.
>
>>> <para>
>>> + The delay occurs only on WAL records for transaction begins and after
>>> + the initial table synchronization. It is possible that the
>>> + replication delay between publisher and subscriber exceeds the value
>>> + of this parameter, in which case no delay is added. Note that the
>>> + delay is calculated between the WAL time stamp as written on
>>> + publisher and the current time on the subscriber. Delays in logical
>>> + decoding and in transfer the transaction may reduce the actual wait
>>> + time. If the system clocks on publisher and subscriber are not
>>> + synchronized, this may lead to apply changes earlier than expected.
>>> + This is not a major issue because a typical setting of this parameter
>>> + are much larger than typical time deviations between servers.
>>> + </para>
>
> There might be a case where tablesync workers are in SYNCWAIT state and waiting for apply worker to tell them to CATCHUP.
> And if apply worker is waiting in apply_delay function, tablesync workers will be stuck at SYNCWAIT state and this might delay tablesync at least "min_apply_delay" amount of time or more.
> Is it something we would want? What do you think?
Good catch. That's an oversight. It should wait for the initial table
synchronization before starting to apply the delay. The main reason is the
current logical replication worker design. It only closes the tablesync workers
after the catchup phase. As you noticed we cannot impose the delay as soon as
the COPY finishes because it will take a long time to finish due to possibly
lack of workers. Instead, let's wait for the READY state for all tables then
apply the delay. I added an explanation for it.

I also modified the test a bit to use the new function
wait_for_subscription_sync introduced in the commit
0c20dd33db1607d6a85ffce24238c1e55e384b49.

I attached a v6.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
v6-0001-Time-delayed-logical-replication-subscriber.patch text/x-patch 63.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-08-08 21:57:59 dropping datumSort field
Previous Message Thomas Munro 2022-08-08 20:30:11 Re: Checking pgwin32_is_junction() errors