Re: Logical replication keepalive flood

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Logical replication keepalive flood
Date: 2021-09-30 05:55:58
Message-ID: CAA4eK1LYqmPyOzL4fbbXDTwd4fAy9bEP4VdYEDuSNgaHKgbCQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 30, 2021 at 8:49 AM Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> I noticed another patch that Horiguchi-San posted earlier[1].
>
> The approach in that patch is to splits the sleep into two
> pieces. If the first sleep reaches the timeout then send a keepalive
> then sleep for the remaining time.
>
> I tested that patch and can see the keepalive message is reduced and
> the patch won't cause the current regression test fail.
>
> Since I didn't find some comments about that patch,
> I wonder did we find some problems about that patch ?
>

I am not able to understand some parts of that patch.

+ If the sleep is shorter
+ * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to
+ * prevent it from getting too-frequent.
+ */
+ if (MyWalSnd->flush < sentPtr &&
+ MyWalSnd->write < sentPtr &&
+ !waiting_for_ping_response)
+ {
+ if (sleeptime > KEEPALIVE_TIMEOUT)
+ {
+ int r;
+
+ r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
+ WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+
+ if (r != 0)
+ continue;
+
+ sleeptime -= KEEPALIVE_TIMEOUT;
+ }
+
+ WalSndKeepalive(false);

It claims to skip sending keepalive if the sleep time is shorter than
KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to
suggest it sends in both cases. What am I missing?

Also, more to the point this special keep_alive seems to be sent for
synchronous replication and walsender shutdown as they can expect
updated locations. You haven't given any reason (theory) why those two
won't be impacted due to this change? I am aware that for synchronous
replication, we wake waiters while ProcessStandbyReplyMessage but I am
not sure how it helps with wal sender shutdown? I think we need to
know the reasons for this message and then need to see if the change
has any impact on the same.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-09-30 06:21:25 Re: Logical replication keepalive flood
Previous Message Masahiko Sawada 2021-09-30 05:45:20 Re: Skipping logical replication transactions on subscriber side