Quick Links

Re: Logical replication keepalive flood

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject:	Re: Logical replication keepalive flood
Date:	2021-09-30 05:55:58
Message-ID:	CAA4eK1LYqmPyOzL4fbbXDTwd4fAy9bEP4VdYEDuSNgaHKgbCQg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Sep 30, 2021 at 8:49 AM Hou, Zhijie/侯志杰 <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> I noticed another patch that Horiguchi-San posted earlier[1].
>
> The approach in that patch is to splits the sleep into two
> pieces. If the first sleep reaches the timeout then send a keepalive
> then sleep for the remaining time.
>
> I tested that patch and can see the keepalive message is reduced and
> the patch won't cause the current regression test fail.
>
> Since I didn't find some comments about that patch,
> I wonder did we find some problems about that patch ?
>

I am not able to understand some parts of that patch.

+ If the sleep is shorter
+ * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to
+ * prevent it from getting too-frequent.
+ */
+ if (MyWalSnd->flush < sentPtr &&
+ MyWalSnd->write < sentPtr &&
+ !waiting_for_ping_response)
+ {
+ if (sleeptime > KEEPALIVE_TIMEOUT)
+ {
+ int r;
+
+ r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
+ WAIT_EVENT_WAL_SENDER_WAIT_WAL);
+
+ if (r != 0)
+ continue;
+
+ sleeptime -= KEEPALIVE_TIMEOUT;
+ }
+
+ WalSndKeepalive(false);

It claims to skip sending keepalive if the sleep time is shorter than
KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to
suggest it sends in both cases. What am I missing?

Also, more to the point this special keep_alive seems to be sent for
synchronous replication and walsender shutdown as they can expect
updated locations. You haven't given any reason (theory) why those two
won't be impacted due to this change? I am aware that for synchronous
replication, we wake waiters while ProcessStandbyReplyMessage but I am
not sure how it helps with wal sender shutdown? I think we need to
know the reasons for this message and then need to see if the change
has any impact on the same.

--
With Regards,
Amit Kapila.

In response to

Re: Logical replication keepalive flood at 2021-09-19 05:46:49 from Amit Kapila

Responses

Re: Logical replication keepalive flood at 2021-09-30 06:21:25 from Greg Nancarrow

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Nancarrow	2021-09-30 06:21:25	Re: Logical replication keepalive flood
Previous Message	Masahiko Sawada	2021-09-30 05:45:20	Re: Skipping logical replication transactions on subscriber side