Re: Logical replication keepalive flood

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Logical replication keepalive flood
Date: 2021-09-30 06:21:25
Message-ID: CAJcOf-c46iu9PUX5r+G8E4e6tYWJODKeewkVAPWdLcAaaVdvqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 30, 2021 at 3:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I am not able to understand some parts of that patch.
>
> + If the sleep is shorter
> + * than KEEPALIVE_TIMEOUT milliseconds, we skip sending a keepalive to
> + * prevent it from getting too-frequent.
> + */
> + if (MyWalSnd->flush < sentPtr &&
> + MyWalSnd->write < sentPtr &&
> + !waiting_for_ping_response)
> + {
> + if (sleeptime > KEEPALIVE_TIMEOUT)
> + {
> + int r;
> +
> + r = WalSndWait(wakeEvents, KEEPALIVE_TIMEOUT,
> + WAIT_EVENT_WAL_SENDER_WAIT_WAL);
> +
> + if (r != 0)
> + continue;
> +
> + sleeptime -= KEEPALIVE_TIMEOUT;
> + }
> +
> + WalSndKeepalive(false);
>
> It claims to skip sending keepalive if the sleep time is shorter than
> KEEPALIVE_TIMEOUT (a new threshold) but the above code seems to
> suggest it sends in both cases. What am I missing?
>

The comment does seem to be wrong.
The way I see it, if the calculated sleeptime is greater than
KEEPALIVE_TIMEOUT, then it first sleeps for up to KEEPALIVE_TIMEOUT
milliseconds and skips sending a keepalive if something happens (i.e.
the socket becomes readable/writeable) during that time (WalSendWait
will return non-zero in that case). Otherwise, it sends a keepalive
and sleeps for (sleeptime - KEEPALIVE_TIMEOUT), then loops around
again ...

> Also, more to the point this special keep_alive seems to be sent for
> synchronous replication and walsender shutdown as they can expect
> updated locations. You haven't given any reason (theory) why those two
> won't be impacted due to this change? I am aware that for synchronous
> replication, we wake waiters while ProcessStandbyReplyMessage but I am
> not sure how it helps with wal sender shutdown? I think we need to
> know the reasons for this message and then need to see if the change
> has any impact on the same.
>

Yes, I'm not sure about the possible impacts, still looking at it.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-09-30 06:37:55 Re: Implementing Incremental View Maintenance
Previous Message Amit Kapila 2021-09-30 05:55:58 Re: Logical replication keepalive flood