Re: Walsender timeouts and large transactions

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: funny(dot)falcon(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org, pjmodos(at)pjmodos(dot)net
Subject: Re: Walsender timeouts and large transactions
Date: 2017-09-12 08:28:34
Message-ID: 20170912.172834.159377870.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov <funny(dot)falcon(at)postgrespro(dot)ru> wrote in <20170906134616(dot)18925(dot)88390(dot)pgcf(at)coridan(dot)postgresql(dot)org>
> I've changed to "need review" to gain more attention from other.

I understand that the problem here is too fast network prohibits
walsender from sending replies.

In physical replication, WAL records are sent as soon as written
and the timeout is handled in the topmost loop in WalSndLoop. In
logical replication, data is sent at once at commit time in most
cases. So it can take a long time in ReorderBufferCommit without
returning to WalSndLoop (or even XLogSendLogical).

One problem here is that WalSndWriteData waits for the arrival of
the next *WAL record* in the slow-ptah because it is called by
cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
WAL insertion. This is I think the root cause of this problem.

On the other hand, it ought to take a sleep when network is
stalled, in other words, data to send remains after a flush. We
don't have a means to signal when the socket queue gets a new
room for another bytes. However, I suppose that such slow network
allows us to sleep several or several tens of milliseconds. Or,
if we could know how many bytes ps_flush_if_writable() pushed,
it's enough to wait only when the function returns pushing
nothing.

As the result, I think that the functions should be modified as
the following.

- Forcing slow-path if time elapses a half of a ping period is
right. (GetCurrentTimestamp is anyway requried.)

- The slow-path should not sleep waiting Latch. It should just
pg_usleep() for maybe 1-10ms.

- We should go to the fast path just after keepalive or response
message has been sent. In other words, the "if (now <" block
should be in the "for (;;)" loop. This avoids needless runs on
the slow-path.

It would be refactorable as the following.

prepare for the send buffer;

for (;;)
{
now = GetCurrentTimeStamp();
if (now < )...
{
fast-path
}
else
{
slow-path
}
return if finished
sleep for 1ms?
}

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2017-09-12 08:40:15 Re: pg_basebackup behavior on non-existent slot
Previous Message Andres Freund 2017-09-12 08:19:29 Re: More efficient truncation of pg_stat_activity query strings