Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.
Date: 2016-02-11 14:48:39
Message-ID: CA+TgmoYrakoJzjNm_QpAZ4wktTG=MbPVJZiKETGeeXWaQ1N=1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Feb 11, 2016 at 9:36 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Feb 11, 2016 at 9:29 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <rhaas(at)postgresql(dot)org> writes:
>>> Add some isolation tests for deadlock detection and resolution.
>>
>> Buildfarm says this needs work ...
>>
>> dromedary is one of mine, do you need me to look into what is
>> happening?
>
> That would be great. Taking a look at what happened, I have a feeling
> this may be a race condition of some kind in the isolation tester. It
> seems to have failed to recognize that a1 started waiting, and that
> caused the "deadlock detected" message to reported differently. I'm
> not immediately sure what to do about that.

Yeah, so: try_complete_step() waits 10ms, and if it still hasn't
gotten any data back from the server, then it uses a separate query to
see whether the step in question is waiting on a lock. So what
must've happened here is that it took more than 10ms for the process
to show up as waiting in pg_stat_activity.

It might be possible to fix this by not passing STEP_NONBLOCK if
there's only one connection that isn't waiting. I think I had it like
that at one point, and then took it out because it caused some other
problem. Another option is to lengthen the timeout. It doesn't seem
great to be dependent on a fixed timeout, but the server doesn't send
any protocol traffic to indicate a lock wait. If we declared which
steps are supposed to wait, then there'd be no ambiguity, but that
seems like a drag.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Teodor Sigaev 2016-02-11 15:11:26 pgsql: Improve error reporting in format()
Previous Message Robert Haas 2016-02-11 14:36:00 Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-02-11 14:49:06 Re: Freeze avoidance of very large table.
Previous Message Tom Lane 2016-02-11 14:38:40 Re: Invalid user-level setting = confusing error message