Walsender timeouts and large transactions

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Walsender timeouts and large transactions
Date: 2017-05-25 15:52:50
Message-ID: e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

We have had issue with walsender timeout when used with logical decoding
and the transaction is taking long time to be decoded (because it
contains many changes)

I was looking today at the walsender code and realized that it's because
if the network and downstream are fast enough, we'll always take fast
path in WalSndWriteData which does not do reply or keepalive processing
and is only reached once the transaction has finished by other code. So
paradoxically we die of timeout because everything was fast enough to
never fall back to slow code path.

I propose we only use fast path if the last processed reply is not older
than half of walsender timeout, if it is then we'll force the slow code
path to process the replies again. This is similar logic that we use to
determine if to send keepalive message. I also added CHECK_INTERRUPRS
call to fast code path because otherwise walsender might ignore them for
too long on large transactions.

Thoughts?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Fix-walsender-timeouts-when-decoding-large-transacti.patch binary/octet-stream 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-05-25 15:54:45 Re: Cached plans and statement generalization
Previous Message Michael Paquier 2017-05-25 15:36:20 Re: Server ignores contents of SASLInitialResponse