Re: Walsender timeouts and large transactions

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: petr(dot)jelinek(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de
Subject: Re: Walsender timeouts and large transactions
Date: 2017-05-30 09:02:19
Message-ID: 20170530.180219.188282269.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> wrote in <e082a56a-fd95-a250-3bae-0fff93832510(at)2ndquadrant(dot)com>
> Hi,
>
> We have had issue with walsender timeout when used with logical decoding
> and the transaction is taking long time to be decoded (because it
> contains many changes)
>
> I was looking today at the walsender code and realized that it's because
> if the network and downstream are fast enough, we'll always take fast
> path in WalSndWriteData which does not do reply or keepalive processing
> and is only reached once the transaction has finished by other code. So
> paradoxically we die of timeout because everything was fast enough to
> never fall back to slow code path.
>
> I propose we only use fast path if the last processed reply is not older
> than half of walsender timeout, if it is then we'll force the slow code
> path to process the replies again. This is similar logic that we use to
> determine if to send keepalive message. I also added CHECK_INTERRUPRS
> call to fast code path because otherwise walsender might ignore them for
> too long on large transactions.
>
> Thoughts?

+ TimestampTz now = GetCurrentTimestamp();

I think it is not recommended to read the current time too
frequently, especially within a loop that hates slowness. (I
suppose that a loop that can fill up a send queue falls into that
category.) If you don't mind a certain amount of additional
complexity for eliminating the possible slowdown by the check,
timeout would be usable. Attached patch does almost the same
thing with your patch but without busy time check.

What do you think about this?

# I saw that SIGQUIT doens't work for active publisher, which I
# think mention in another thread.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
Fix-walsender-timeouts-by-timeout.patch text/x-patch 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2017-05-30 09:26:00 Re: [COMMITTERS] Re: pgsql: Code review focused on new node types added by partitioning supp
Previous Message Simon Riggs 2017-05-30 08:24:46 Re: pg_resetwal is broken if run from v10 against older version of PG data directory