RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-03 06:35:29
Message-ID: TYCPR01MB837337242FF0F5C50B997E76EDD79@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Friday, February 3, 2023 2:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu)
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > >
> > ...
> > > > Besides, I am not sure it's a stable test to check the log. Is it
> > > > possible that there's no such log on a slow machine? I modified
> > > > the code to sleep 1s at the beginning of apply_dispatch(), then
> > > > the new added test failed because the server log cannot match.
> > > To get the log by itself is necessary to ensure that the delay is
> > > conducted by the apply worker, because we emit the diffms only if
> > > it's bigger than 0 in maybe_apply_delay(). If we omit the step, we
> > > are not sure the delay is caused by other reasons or the time-delayed
> feature.
> > >
> > > As you mentioned, it's possible that no log is emitted on slow
> > > machine. Then, the idea to make the test safer for such machines should
> be to make the delayed time longer.
> > > But we shortened the delay time to 1 second to mitigate the long test
> execution time of this TAP test.
> > > So, I'm not sure if it's a good idea to make it longer again.
> >
> > I think there are a couple of things that can be done about this problem:
> >
> > 1. If you need the code/test to remain as-is then at least the test
> > message could include some comforting text like "(this can fail on
> > slow machines when the delay time is already exceeded)" so then a test
> > failure will not cause undue alarm.
> >
> > 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that
> > it will *always* log the remaining wait time even if that wait time
> > becomes negative. Then I think the test cases can be made
> > deterministic instead of relying on good luck. This seems like the
> > better option.
> >
>
> I don't understand why we have to do any of this instead of using 3s as
> min_apply_delay similar to what we are doing in
> src/test/recovery/t/005_replay_delay. Also, I think we should use exactly the
> same way to verify the test even though we want to keep the log level as
> DEBUG2 to check logs in case of any failures.
OK, will try to make our tests similar to the tests in 005_replay_delay
as much as possible.

> Also, I don't see the need to add more tests like the ones below:
> +# Test whether ALTER SUBSCRIPTION changes the delayed time of the apply
> +worker # (1 day 5 minutes). Note that the extra 5 minute is to account
> +for any # decoding/network overhead.
>
> Let's try to add tests similar to what we have for recovery_min_apply_delay
> unless there is some functionality in this patch that is not there in the
> recovery_min_apply_delay feature.
The above command is a preparation part to check a behavior unique to time-delayed
logical replication, which is to DISABLE a subscription causes the apply worker not to apply
the suspended (delayed) transaction. So, it will be OK to have this test.

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2023-02-03 07:05:46 Re: [PATCH] Add pretty-printed XML output option
Previous Message shveta malik 2023-02-03 06:34:58 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication