Re: Logical replication keepalive flood

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hou(at)gmail(dot)com, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Logical replication keepalive flood
Date: 2021-10-01 08:14:22
Message-ID: CAJcOf-e7go4w3DYDAfv7iLBELa8MGg4Kyys4tsPdVQq9R95KiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 30, 2021 at 5:56 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> After the patch applied, that keepalive is sent only when the loop is
> actually going to sleep some time. In case the next WAL doesn't come
> for KEEPALIVE_TIMEOUT milliseconds, it sends a keepalive. There's a
> dubious behavior when sleeptime <= KEEPALIVE_TIMEOUT that it sends a
> keepalive immediately. It was (as far as I recall) intentional in
> order to make the code simpler. However, on second thought, we will
> have the next chance to send keepalive in that case, and intermittent
> frequent keepalives can happen with that behavior. So I came to think
> that we can omit keepalives at all that case.
>
> (I myself haven't see the keepalive flood..)
>

I tried your updated patch
(avoid_keepalive_flood_at_bleeding_edge_of_wal.patch, rebased) and
also manually applied your previous keepalive-counting code
(count_keepalives2.diff.txt), adapted to the code updates.
I tested both the problem originally reported (which used
pg_recvlogical) and similarly using pub/sub of the pgbench_history
table, and in both cases I found that your patch very significantly
reduced the keepalives, so the keepalive flood is no longer seen.
I am still a little unsure about the impact on pg_recvlogical --endpos
functionality, which is detected by the regression test failure. I did
try to update pg_recvlogical, to not rely on a keepalive for --endpos,
but so far haven't been successful in doing that. If the test is
altered/removed then I think that the documentation for pg_recvlogical
--endpos will need updating in some way.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-10-01 08:32:00 Re: Skipping logical replication transactions on subscriber side
Previous Message Michael Paquier 2021-10-01 08:01:33 Re: create table like: ACCESS METHOD