Re: Logical replication keepalive flood

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: abbas(dot)butt(at)enterprisedb(dot)com
Cc: amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, zahid(dot)iqbal(at)enterprisedb(dot)com
Subject: Re: Logical replication keepalive flood
Date: 2021-06-10 06:00:16
Message-ID: 20210610.150016.1709823354377067679.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 9 Jun 2021 17:32:25 +0500, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com> wrote in
>
> On Wed, Jun 9, 2021 at 2:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Does these keepalive messages are sent at the same frequency even for
> > subscribers?
>
> Yes, I have tested it with one publisher and one subscriber.
> The moment I start pgbench session I can see keepalive messages sent and
> replied by the subscriber with same frequency.
>
> > Basically, I wanted to check if we have logical
> > replication set up between 2 nodes then do we send these keep-alive
> > messages flood?
>
> Yes we do.
>
> > If not, then why is it different in the case of
> > pg_recvlogical?
>
> Nothing, the WAL sender behaviour is same in both cases.
>
>
> > Is it possible that the write/flush location is not
> > updated at the pace at which we expect?

Yes. MyWalSnd->flush/write are updated far frequently but still
MyWalSnd->write is behind sentPtr by from thousands of bytes up to
less than 1 block (1block = 8192 bytes). (Flush lags are larger than
write lags, of course.)

I counted how many times keepalives are sent for each request length
to logical_read_xlog_page() for 10 seconds pgbench run and replicating
pgbench_history, using the attached change.

size: sent /notsent/ calls: write lag/ flush lag
8: 3 / 6 / 3: 5960 / 348962
16: 1 / 2 / 1: 520 / 201096
24: 2425 / 4852 / 2461: 5259 / 293569
98: 2 / 0 / 54: 5 / 1050
187: 2 / 0 / 94: 0 / 1060
4432: 1 / 0 / 1: 410473592 / 410473592
7617: 2 / 0 / 27: 317 / 17133
8280: 1 / 2 / 4: 390 / 390

Where,

size is requested data length to logical_read_xlog_page()

sent is the number of keepalives sent in the loop in WalSndWaitForWal

notsent is the number of runs of the loop in WalSndWaitForWal without
sending a keepalive

calls is the number of calls to WalSndWaitForWal

write lag is the bytes MyWalSnd->write is behind from sentPtr at the
first run of the loop per call to logical_read_xlog_page.

flush lag is the the same to the above for MyWalSnd->flush.

Maybe the line of size=4432 is the first time fetch of WAL.

So this numbers show that WalSndWaitForWal is called almost only at
starting to fetching a record, and in that case the function runs the
loop three times and sends one keepalive by average.

> Well, it is async replication. The receiver can choose to update LSNs at
> its own will, say after 10 mins interval.
> It should only impact the size of WAL retained by the server.
>
> Please see commit 41d5f8ad73
> > which seems to be talking about a similar problem.
> >
>
> That commit does not address this problem.

Yeah, at least for me, WalSndWaitForWal send a keepalive per one call.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-06-10 06:12:31 Re: Logical replication keepalive flood
Previous Message Noah Misch 2021-06-10 05:47:15 Re: BF assertion failure on mandrill in walsender, v13