Re: Time delayed LR (WAS Re: logical replication restrictions)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: kuroda(dot)hayato(at)fujitsu(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, vignesh21(at)gmail(dot)com, euler(at)eulerto(dot)com, m(dot)melihmutlu(at)gmail(dot)com, andres(at)anarazel(dot)de, marcos(at)f10(dot)com(dot)br, pgsql-hackers(at)postgresql(dot)org, smithpb2250(at)gmail(dot)com
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2022-12-15 01:46:11
Message-ID: 20221215.104611.330470611359597283.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
> I have implemented and tested that workers wake up per wal_receiver_timeout/2
> and send keepalive. Basically it works well, but I found two problems.
> Do you have any good suggestions about them?
>
> 1)
>
> With this PoC at present, workers calculate sending intervals based on its
> wal_receiver_timeout, and it is suppressed when the parameter is set to zero.
>
> This means that there is a possibility that walsender is timeout when wal_sender_timeout
> in publisher and wal_receiver_timeout in subscriber is different.
> Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,

It seems to me wal_receiver_status_interval is better for this use.
It's enough for us to docuemnt that "wal_r_s_interval should be
shorter than wal_sener_timeout/2 especially when logical replication
connection is using min_apply_delay. Otherwise you will suffer
repeated termination of walsender".

> and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and
> send keepalives, but walsender exits before the message arrives to publisher.
>
> One idea to avoid that is to send the min_apply_delay subscriber option to publisher
> and compare them, but it may be not sufficient. Because XXX_timout GUC parameters
> could be modified later.

# Anyway, I don't think such asymmetric setup is preferable.

> 2)
>
> The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2],
> the root cause is that flushed WAL is not updated and sent to the publisher. Even
> if workers send keepalive messages to pub during the delay, the flushed position
> cannot be modified.

I didn't look closer but the cause I guess is walsender doesn't die
until all WAL has been sent, while logical delay chokes replication
stream. Allowing walsender to finish ignoring replication status
wouldn't be great. One idea is to let logical workers send delaying
status.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-12-15 01:52:00 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Kyotaro Horiguchi 2022-12-15 00:56:19 Re: pg_upgrade: Make testing different transfer modes easier