Re: [HACKERS] make async slave to wait for lsn to be replayed

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Kartyshov Ivan <i(dot)kartyshov(at)postgrespro(dot)ru>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, dilipbalaut(at)gmail(dot)com, smithpb2250(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] make async slave to wait for lsn to be replayed
Date: 2024-03-28 23:15:13
Message-ID: CAPpHfduVwKZ=wVWaomWsjV3gKj6t6B96TpOOK4rw1F6wCib=xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 22, 2024 at 3:45 PM Kartyshov Ivan
<i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
> On 2024-03-20 12:11, Alexander Korotkov wrote:
> > On Wed, Mar 20, 2024 at 12:34 AM Kartyshov Ivan
> > <i(dot)kartyshov(at)postgrespro(dot)ru> wrote:
> >> > 4.2 With an unreasonably high future LSN, BEGIN command waits
> >> > unboundedly, shouldn't we check if the specified LSN is more than
> >> > pg_last_wal_receive_lsn() error out?
> >
> > I think limiting wait lsn by current received lsn would destroy the
> > whole value of this feature. The value is to wait till given LSN is
> > replayed, whether it's already received or not.
>
> Ok sounds reasonable, I`ll rollback the changes.
>
> > But I don't see a problem here. On the replica, it's out of our
> > control to check which lsn is good and which is not. We can't check
> > whether the lsn, which is in future for the replica, is already issued
> > by primary.
> >
> > For the case of wrong lsn, which could cause potentially infinite
> > wait, there is the timeout and the manual query cancel.
>
> Fully agree with this take.
>
> >> > 4.3 With an unreasonably high wait time, BEGIN command waits
> >> > unboundedly, shouldn't we restrict the wait time to some max
> > value,
> >> > say a day or so?
> >> > SELECT pg_last_wal_receive_lsn() + 1 AS future_receive_lsn \gset
> >> > BEGIN AFTER :'future_receive_lsn' WITHIN 100000;
> >>
> >> Good idea, I put it 1 day. But this limit we should to discuss.
> >
> > Do you think that specifying timeout in milliseconds is suitable? I
> > would prefer to switch to seconds (with ability to specify fraction of
> > second). This was expressed before by Alexander Lakhin.
>
> It sounds like an interesting idea. Please review the result.
>
> >> > https://github.com/macdice/redo-bench or similar tools?
> >
> > Ivan, could you do this?
>
> Yes, test redo-bench/crash-recovery.sh
> This patch on master
> 91.327, 1.973
> 105.907, 3.338
> 98.412, 4.579
> 95.818, 4.19
>
> REL_13-STABLE
> 116.645, 3.005
> 113.212, 2.568
> 117.644, 3.183
> 111.411, 2.782
>
> master
> 124.712, 2.047
> 117.012, 1.736
> 116.328, 2.035
> 115.662, 1.797
>
> Strange behavior, patched version is faster then REL_13-STABLE and
> master.

I've run this test on my machine with v13 of the path.

patched
53.663, 0.466
53.884, 0.402
54.102, 0.441

master
55.216, 0.441
54.52, 0.464
51.479, 0.438

It seems that difference is less than variance.

------
Regards,
Alexander Korotkov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2024-03-28 23:37:54 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Erik Wienhold 2024-03-28 23:02:11 Re: PSQL Should \sv & \ev work with materialized views?