Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: kuroda(dot)hayato(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, vignesh21(at)gmail(dot)com, euler(at)eulerto(dot)com, m(dot)melihmutlu(at)gmail(dot)com, andres(at)anarazel(dot)de, marcos(at)f10(dot)com(dot)br, pgsql-hackers(at)postgresql(dot)org, smithpb2250(at)gmail(dot)com
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-15 03:53:12
Message-ID: CAA4eK1Lq+h8qo+rqGU-E+hwJKAHYocV54y4pvou4rLysCgYD-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
> > I have implemented and tested that workers wake up per wal_receiver_timeout/2
> > and send keepalive. Basically it works well, but I found two problems.
> > Do you have any good suggestions about them?
> >
> > 1)
> >
> > With this PoC at present, workers calculate sending intervals based on its
> > wal_receiver_timeout, and it is suppressed when the parameter is set to zero.
> >
> > This means that there is a possibility that walsender is timeout when wal_sender_timeout
> > in publisher and wal_receiver_timeout in subscriber is different.
> > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,
>
> It seems to me wal_receiver_status_interval is better for this use.
> It's enough for us to docuemnt that "wal_r_s_interval should be
> shorter than wal_sener_timeout/2 especially when logical replication
> connection is using min_apply_delay. Otherwise you will suffer
> repeated termination of walsender".
>

This sounds reasonable to me.

> > and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and
> > send keepalives, but walsender exits before the message arrives to publisher.
> >
> > One idea to avoid that is to send the min_apply_delay subscriber option to publisher
> > and compare them, but it may be not sufficient. Because XXX_timout GUC parameters
> > could be modified later.
>
> # Anyway, I don't think such asymmetric setup is preferable.
>
> > 2)
> >
> > The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2],
> > the root cause is that flushed WAL is not updated and sent to the publisher. Even
> > if workers send keepalive messages to pub during the delay, the flushed position
> > cannot be modified.
>
> I didn't look closer but the cause I guess is walsender doesn't die
> until all WAL has been sent, while logical delay chokes replication
> stream.
>

Right, I also think so.

> Allowing walsender to finish ignoring replication status
> wouldn't be great.
>

Yes, that would be ideal. But do you know why that is a must?

> One idea is to let logical workers send delaying
> status.
>

How can that help?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-12-15 04:14:56 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Amit Kapila 2022-12-15 03:48:55 Re: Time delayed LR (WAS Re: logical replication restrictions)