pgsql: Be more predictable about reporting "lock timeout" vs "statement

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Be more predictable about reporting "lock timeout" vs "statement
Date: 2016-05-27 14:40:39
Message-ID: E1b6IwJ-000772-96@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Be more predictable about reporting "lock timeout" vs "statement timeout".

If both timeout indicators are set when we arrive at ProcessInterrupts,
we've historically just reported "lock timeout". However, some buildfarm
members have been observed to fail isolationtester's timeouts test by
reporting "lock timeout" when the statement timeout was expected to fire
first. The cause seems to be that the process is allowed to sleep longer
than expected (probably due to heavy machine load) so that the lock
timeout happens before we reach the point of reporting the error, and
then this arbitrary tiebreak rule does the wrong thing. We can improve
matters by comparing the scheduled timeout times to decide which error
to report.

I had originally proposed greatly reducing the 1-second window between
the two timeouts in the test cases. On reflection that is a bad idea,
at least for the case where the lock timeout is expected to fire first,
because that would assume that it takes negligible time to get from
statement start to the beginning of the lock wait. Thus, this patch
doesn't completely remove the risk of test failures on slow machines.
Empirically, however, the case this handles is the one we are seeing
in the buildfarm. The explanation may be that the other case requires
the scheduler to take the CPU away from a busy process, whereas the
case fixed here only requires the scheduler to not give the CPU back
right away to a process that has been woken from a multi-second sleep
(and, perhaps, has been swapped out meanwhile).

Back-patch to 9.3 where the isolationtester timeouts test was added.

Discussion: <8693(dot)1464314819(at)sss(dot)pgh(dot)pa(dot)us>

Branch
------
REL9_5_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/cea17ba07a93c0185aa9cbbf79ce9d3241b9c547

Modified Files
--------------
src/backend/tcop/postgres.c | 23 +++++++++++++++++++----
src/backend/utils/misc/timeout.c | 16 +++++++++++++++-
src/include/utils/timeout.h | 1 +
3 files changed, 35 insertions(+), 5 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Kevin Grittner 2016-05-27 14:58:27 Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Previous Message Magnus Hagander 2016-05-26 20:22:45 pgsql: Make pg_dump error cleanly with -j against hot standby