RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "smithpb2250(at)gmail(dot)com" <smithpb2250(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-22 06:01:49
Message-ID: TYCPR01MB83730A3E21E921335F6EFA38EDE89@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thursday, December 15, 2022 12:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> wrote:
> >
> > At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)"
> > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
> > > I have implemented and tested that workers wake up per
> > > wal_receiver_timeout/2 and send keepalive. Basically it works well, but I
> found two problems.
> > > Do you have any good suggestions about them?
> > >
> > > 1)
> > >
> > > With this PoC at present, workers calculate sending intervals based
> > > on its wal_receiver_timeout, and it is suppressed when the parameter is set
> to zero.
> > >
> > > This means that there is a possibility that walsender is timeout
> > > when wal_sender_timeout in publisher and wal_receiver_timeout in
> subscriber is different.
> > > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is
> > > 5min,
> >
> > It seems to me wal_receiver_status_interval is better for this use.
> > It's enough for us to docuemnt that "wal_r_s_interval should be
> > shorter than wal_sener_timeout/2 especially when logical replication
> > connection is using min_apply_delay. Otherwise you will suffer
> > repeated termination of walsender".
> >
>
> This sounds reasonable to me.
Okay, I changed the time interval to wal_receiver_status_interval
and added this description about timeout.

FYI, wal_receiver_status_interval by definition specifies
the minimum frequency for the WAL receiver process to send information
to the upstream. So I utilized the value for WaitLatch directly.
My descriptions of the documentation change follow it.

> > > and min_apply_delay is 10min. The worker on subscriber will wake up
> > > per 2.5min and send keepalives, but walsender exits before the message
> arrives to publisher.
> > >
> > > One idea to avoid that is to send the min_apply_delay subscriber
> > > option to publisher and compare them, but it may be not sufficient.
> > > Because XXX_timout GUC parameters could be modified later.
> >
> > # Anyway, I don't think such asymmetric setup is preferable.
> >
> > > 2)
> > >
> > > The issue reported by Vignesh-san[1] has still remained. I have
> > > already analyzed that [2], the root cause is that flushed WAL is not
> > > updated and sent to the publisher. Even if workers send keepalive
> > > messages to pub during the delay, the flushed position cannot be modified.
> >
> > I didn't look closer but the cause I guess is walsender doesn't die
> > until all WAL has been sent, while logical delay chokes replication
> > stream.
For the (2) issue, a new thread has been created independently from this thread in [1].
I'll leave any new changes to the thread on this point.

Attached the updated patch.
Again, I used one basic patch in another thread to wake up logical replication worker
shared in [2] for TAP tests.

[1] - https://www.postgresql.org/message-id/TYAPR01MB586668E50FC2447AD7F92491F5E89@TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/flat/20221122004119.GA132961%40nathanxps13

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
v11-0001-wake-up-logical-workers-as-needed-instead-of-rel.patch application/octet-stream 6.4 KB
v11-0002-Time-delayed-logical-replication-subscriber.patch application/octet-stream 70.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-12-22 06:02:42 An oversight in ExecInitAgg for grouping sets
Previous Message Hayato Kuroda (Fujitsu) 2022-12-22 05:50:03 RE: Time delayed LR (WAS Re: logical replication restrictions)