Re: statement_timeout is not working as expected with postgres_fdw

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: statement_timeout is not working as expected with postgres_fdw
Date: 2017-05-04 16:18:24
Message-ID: CAA4eK1Kj_QpSZd5gFCsD0pBWD--K0ZCdzXmFdZNyQ0rn-Oa+8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 4, 2017 at 8:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 4, 2017 at 7:13 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>>> - For bonus points, give pgfdw_exec_query() an optional timeout
>>> argument, and set it to 30 seconds or so when we're doing abort
>>> cleanup. If the timeout succeeds, it errors out (which again
>>> re-enters the abort path, dropping the connection and forcing outer
>>> transaction levels to abort as well).
>>
>> Why do we need this 30 seconds timeout for abort cleanup failures?
>> Isn't it sufficient if we propagate the error only on failures?
>
> Well, the problem is that right now we're waiting for vast amounts of
> time when the remote connection dies. First, the user notices that
> the command is running for too long and hits ^C. At that point, the
> connection is probably already dead, and the user's been waiting for a
> while already. Then, we wait for PQcancel() to time out. Then, we
> wait for PQexec() to time out. The result of that, as Suraj mentioned
> in the original email, is that it takes about 16 minutes for the
> session to recover when the connection dies, even with keepalive
> settings and connection timeout set all the way to the minimum. The
> goal here is to make it take a more reasonable time to recover.
> Without modifying libpq, we can't do anything about the need to wait
> for PQcancel() to time out, but we can improve the rest of it. If we
> removed that 30-second timeout, we would win in the case where either
> ABORT TRANSACTION or ROLLBACK; RELEASE eventually succeeds but takes
> more than 30 seconds. In that case, the change you are proposing
> would allow us to reuse the connection instead of dropping it, and
> would prevent a forced toplevel abort when subtransactions are in use.
> However, the cost would be that when one of those commands never
> succeeds, we would wait a lot longer before finishing a transaction
> abort.
>

As soon as the first command fails due to timeout, we will set
'abort_cleanup_failure' which will make a toplevel transaction to
abort and also won't allow other statements to execute. The patch is
trying to enforce a 30-second timeout after statement execution, so it
has to anyway wait till the command execution completes (irrespective
of whether the command succeeds or fail). It is quite possible I am
missing some point, is it possible for you to tell in somewhat more
detail in which exact case 30-second timeout will allow us to abort
the toplevel transaction if we already do that in the case of
statement failure case?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-05-04 16:22:36 Re: CTE inlining
Previous Message Dmitriy Sarafannikov 2017-05-04 16:12:55 Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range