Re: Walsender timeouts and large transactions

From: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, craig(at)2ndquadrant(dot)com
Subject: Re: Walsender timeouts and large transactions
Date: 2017-08-10 11:20:17
Message-ID: a73b350428359be38adc3fc7c99dcf78@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-08-09 16:23, Petr Jelinek wrote:
> On 02/08/17 19:35, Yura Sokolov wrote:
>> The following review has been posted through the commitfest
>> application:
>> make installcheck-world: tested, passed
>> Implements feature: not tested
>> Spec compliant: not tested
>> Documentation: not tested
>>
>> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout
>> <= 0) as in other places
>> (in WalSndKeepaliveIfNecessary for example).
>>
>> I don't think moving update of 'now' down to end of loop body is
>> correct:
>> there are calls to ProcessConfigFile with SyncRepInitConfig,
>> ProcessRepliesIfAny that can
>> last non-negligible time. It could lead to over sleeping due to larger
>> computed sleeptime.
>> Though I could be mistaken.
>>
>> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop
>> after WalSndKeepaliveIfNecessary.
>> Is it necessary? But it looks harmless at least.
>>
>
> We also need to do actual timeout handing so that the timeout is not
> deferred to the end of the transaction (Which is why I moved `if
> (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> WalSndKeepaliveIfNecessary() calls).
>

If standby really stalled, then it will not read from socket, and then
`pg_is_sendpending` eventually will return false, and timeout will be
checked.
If standby doesn't stall, then `last_reply_timestamp` will be updated in
`ProcessRepliesIfAny`, and so timeout will not be triggered.
Do I understand correctly, or I missed something?

>> Could patch be reduced to check after first `if
>> (!pg_is_sendpending())` ? like:
>>
>> if (!pq_is_send_pending())
>> - return;
>> + {
>> + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
>> + {
>> + CHECK_FOR_INTERRUPTS();
>> + return;
>> + }
>> + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
>> wal_sender_timeout / 2))
>> + return;
>> + }
>>
>> If not, what problem prevents?
>
> We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
> so that it's possible to stop walsender while it's processing large
> transaction.

In this case CHECK_FOR_INTERRUPTS will be called after
wal_sender_timeout/2.
(This diff is for first appearance of `pq_is_send_pending` in a
function).

If it should be called more often, then patch could be simplier:

if (!pq_is_send_pending())
- return;
+ {
+ CHECK_FOR_INTERRUPTS();
+ if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0 ||
+ now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
wal_sender_timeout / 2))
+ return;
+ }

(Still, I could be mistaken, it is just suggestion).

Is it hard to add test for case this patch fixes?

With regards,
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-08-10 12:00:22 Re: Partition-wise join for join between (declaratively) partitioned tables
Previous Message Jeevan Ladhe 2017-08-10 10:56:12 Re: Default Partition for Range