Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-02-03 05:20:56
Message-ID: CAA4eK1K-JESJ_AfUtvNy+=rA4eXXRTz3o2O8TdVch9=r-7g1Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 3, 2023 at 6:41 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Feb 2, 2023 at 7:21 PM Takamichi Osumi (Fujitsu)
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> ...
> >
> >
> > > Besides, I am not sure it's a stable test to check the log. Is it possible that there's
> > > no such log on a slow machine? I modified the code to sleep 1s at the beginning
> > > of apply_dispatch(), then the new added test failed because the server log
> > > cannot match.
> > To get the log by itself is necessary to ensure
> > that the delay is conducted by the apply worker, because we emit the diffms
> > only if it's bigger than 0 in maybe_apply_delay(). If we omit the step,
> > we are not sure the delay is caused by other reasons or the time-delayed feature.
> >
> > As you mentioned, it's possible that no log is emitted on slow machine. Then,
> > the idea to make the test safer for such machines should be to make the delayed time longer.
> > But we shortened the delay time to 1 second to mitigate the long test execution time of this TAP test.
> > So, I'm not sure if it's a good idea to make it longer again.
>
> I think there are a couple of things that can be done about this problem:
>
> 1. If you need the code/test to remain as-is then at least the test
> message could include some comforting text like "(this can fail on
> slow machines when the delay time is already exceeded)" so then a test
> failure will not cause undue alarm.
>
> 2. Try moving the DEBUG2 elog (in function maybe_apply_delay) so that
> it will *always* log the remaining wait time even if that wait time
> becomes negative. Then I think the test cases can be made
> deterministic instead of relying on good luck. This seems like the
> better option.
>

I don't understand why we have to do any of this instead of using 3s
as min_apply_delay similar to what we are doing in
src/test/recovery/t/005_replay_delay. Also, I think we should use
exactly the same way to verify the test even though we want to keep
the log level as DEBUG2 to check logs in case of any failures.

Also, I don't see the need to add more tests like the ones below:
+# Test whether ALTER SUBSCRIPTION changes the delayed time of the apply worker
+# (1 day 5 minutes). Note that the extra 5 minute is to account for any
+# decoding/network overhead.

Let's try to add tests similar to what we have for
recovery_min_apply_delay unless there is some functionality in this
patch that is not there in the recovery_min_apply_delay feature.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-02-03 05:35:48 Re: Weird failure with latches in curculio on v15
Previous Message Amit Kapila 2023-02-03 04:46:44 Re: Deadlock between logrep apply worker and tablesync worker