RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "smithpb2250(at)gmail(dot)com" <smithpb2250(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-14 10:46:17
Message-ID: TYAPR01MB5866360932F60714625192F9F5E09@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Horiguchi-san, Amit,

> > On Tue, Dec 13, 2022 at 7:35 AM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > >
> > > At Mon, 12 Dec 2022 18:10:00 +0530, Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > Yeah, I think ideally it will timeout but if we have a solution like
> > during delay, we keep sending ping messages time-to-time, it should
> > work fine. However, that needs to be verified. Do you see any reasons
> > why that won't work?

I have implemented and tested that workers wake up per wal_receiver_timeout/2
and send keepalive. Basically it works well, but I found two problems.
Do you have any good suggestions about them?

1)

With this PoC at present, workers calculate sending intervals based on its
wal_receiver_timeout, and it is suppressed when the parameter is set to zero.

This means that there is a possibility that walsender is timeout when wal_sender_timeout
in publisher and wal_receiver_timeout in subscriber is different.
Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,
and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and
send keepalives, but walsender exits before the message arrives to publisher.

One idea to avoid that is to send the min_apply_delay subscriber option to publisher
and compare them, but it may be not sufficient. Because XXX_timout GUC parameters
could be modified later.

2)

The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2],
the root cause is that flushed WAL is not updated and sent to the publisher. Even
if workers send keepalive messages to pub during the delay, the flushed position
cannot be modified.

[1]: https://www.postgresql.org/message-id/CALDaNm1vT8qNBqHivtAgYur-5-YkwF026VHtw9srd4fsdeaufA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/TYAPR01MB5866F6BE7399E6343A96E016F51C9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-14 10:59:45 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message John Naylor 2022-12-14 10:37:52 Re: slab allocator performance issues