Re: Walsender timeouts and large transactions

From: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pjmodos(at)pjmodos(dot)net
Subject: Re: Walsender timeouts and large transactions
Date: 2017-09-27 11:28:37
Message-ID: 90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> Hello,
>
> At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov
> <funny(dot)falcon(at)postgrespro(dot)ru> wrote in
> <20170906134616(dot)18925(dot)88390(dot)pgcf(at)coridan(dot)postgresql(dot)org>
>> I've changed to "need review" to gain more attention from other.
>
> I understand that the problem here is too fast network prohibits
> walsender from sending replies.
>
> In physical replication, WAL records are sent as soon as written
> and the timeout is handled in the topmost loop in WalSndLoop. In
> logical replication, data is sent at once at commit time in most
> cases. So it can take a long time in ReorderBufferCommit without
> returning to WalSndLoop (or even XLogSendLogical).
>
> One problem here is that WalSndWriteData waits for the arrival of
> the next *WAL record* in the slow-ptah because it is called by
> cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
> WAL insertion. This is I think the root cause of this problem.
>
> On the other hand, it ought to take a sleep when network is
> stalled, in other words, data to send remains after a flush. We
> don't have a means to signal when the socket queue gets a new
> room for another bytes. However, I suppose that such slow network
> allows us to sleep several or several tens of milliseconds. Or,
> if we could know how many bytes ps_flush_if_writable() pushed,
> it's enough to wait only when the function returns pushing
> nothing.
>
> As the result, I think that the functions should be modified as
> the following.
>
> - Forcing slow-path if time elapses a half of a ping period is
> right. (GetCurrentTimestamp is anyway requried.)
>
> - The slow-path should not sleep waiting Latch. It should just
> pg_usleep() for maybe 1-10ms.
>
> - We should go to the fast path just after keepalive or response
> message has been sent. In other words, the "if (now <" block
> should be in the "for (;;)" loop. This avoids needless runs on
> the slow-path.
>
>
> It would be refactorable as the following.
>
> prepare for the send buffer;
>
> for (;;)
> {
> now = GetCurrentTimeStamp();
> if (now < )...
> {
> fast-path
> }
> else
> {
> slow-path
> }
> return if finished
> sleep for 1ms?
> }
>
>
> What do you think about this?
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Good day, Petr, Kyotaro

I've created failing test for issue (0001-Add-failing-test...) .
It tests insertion of 20000 rows with 10ms wal_sender_timeout
(it fails in WalSndWriteData on master) and then deletion of
those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).

Both Peter's patch and my simplified suggestion didn't pass the
test. I didn't checked Kyotaro's suggestion, though, cause I
didn't understand it well.

I've made patch that passes the test (0002-Fix-walsender...) .
(I've used Petr's commit message. Don't you mind, Petr?)

In WalSndWriteData it adds CHECK_FOR_INTERRUPTS to fastpath and
falls through to slow path after half of wal_sender_timeout as
were discussed.
In a slow path, it just skips fast exit on `!pq_is_send_pending()`
and check for timeout for the first loop iteration. And it sets
sleeptime to 1ms even if timeout were reached. It gives a chance
to receiver's response to arrive.

In WalSndLoop it also skips check for timeout first iteration after
send_data were called, and also sleeps at least 1 ms.

I'm not sure about correctness of my patch. Given test exists,
you may suggest better solutions, or improve this solution.

I'll set commitfest topic status to 'Need review' assuming
my patch could be reviewed.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

Attachment Content-Type Size
0001-Add-failing-test-wal_sender_timeout-logical-decoding.patch text/x-diff 4.4 KB
0002-Fix-walsender-timeouts-when-decoding-large-transacti.patch text/x-diff 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2017-09-27 11:46:23 Re: logical decoding of two-phase transactions
Previous Message amul sul 2017-09-27 11:07:22 Restrict concurrent update/delete with UPDATE of partition key