Re: Logical replication keepalive flood

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: gregn4422(at)gmail(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, Hou(at)gmail(dot)com, houzj(dot)fnst(at)fujitsu(dot)com, andres(at)anarazel(dot)de, abbas(dot)butt(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, zahid(dot)iqbal(at)enterprisedb(dot)com, smithpb2250(at)gmail(dot)com
Subject: Re: Logical replication keepalive flood
Date: 2021-09-30 07:56:29
Message-ID: 20210930.165629.1243423254265459564.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 30 Sep 2021 16:21:25 +1000, Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote in
> On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It claims to skip sending keepalive if the sleep time is shorter than
> > KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to
> > suggest it sends in both cases. What am I missing?
> >
>
> The comment does seem to be wrong.

Mmm. Indeed the comment does say something like that... Looking the
patch name together, I might have confused something. However, the
patch looks like working for the purpose.

> The way I see it, if the calculated sleeptime is greater than
> KEEPALIVE_TIMEOUT, then it first sleeps for up to KEEPALIVE_TIMEOUT
> milliseconds and skips sending a keepalive if something happens (i.e.
> the socket becomes readable/writeable) during that time (WalSendWait
> will return non-zero in that case). Otherwise, it sends a keepalive
> and sleeps for (sleeptime - KEEPALIVE_TIMEOUT), then loops around
> again ...

The maim point of the patch is moving of the timing of sending the
before-sleep keepalive. It seems to me that currently
WalSndWaitForWal may send "before-sleep" keepalive every run of the
loop under a certain circumstance. I suspect this happen in this case.

After the patch applied, that keepalive is sent only when the loop is
actually going to sleep some time. In case the next WAL doesn't come
for KEEPALIVE_TIMEOUT milliseconds, it sends a keepalive. There's a
dubious behavior when sleeptime <= KEEPALIVE_TIMEOUT that it sends a
keepalive immediately. It was (as far as I recall) intentional in
order to make the code simpler. However, on second thought, we will
have the next chance to send keepalive in that case, and intermittent
frequent keepalives can happen with that behavior. So I came to think
that we can omit keepalives at all that case.

(I myself haven't see the keepalive flood..)

> > Also, more to the point this special keep_alive seems to be sent for
> > synchronous replication and walsender shutdown as they can expect
> > updated locations. You haven't given any reason (theory) why those two
> > won't be impacted due to this change? I am aware that for synchronous
> > replication, we wake waiters while ProcessStandbyReplyMessage but I am
> > not sure how it helps with wal sender shutdown? I think we need to
> > know the reasons for this message and then need to see if the change
> > has any impact on the same.
> >
>
> Yes, I'm not sure about the possible impacts, still looking at it.

If the comment describes the objective correctly, the only possible
impact would be that there may be a case where server responds a bit
slowly for a shutdown request. But I'm not sure it is definitely
true.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
avoid_keepalive_flood_at_bleeding_edge_of_wal.patch text/x-patch 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-09-30 08:08:35 Re: Logical replication keepalive flood
Previous Message Andres Freund 2021-09-30 07:51:52 Re: prevent immature WAL streaming