Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.
Date: 2016-02-17 14:52:12
Message-ID: CA+TgmoZqOpSJk1H9U4JxbQuX5Ywzb0=ZVRnzZzBQj3RCJuzbLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Feb 16, 2016 at 2:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> I wonder if we shouldn't just expose a 'which pid is process X waiting
>> for' API, implemented serverside. That's generally really useful, and
>> looks like it's actually going to be less complicated than that
>> query... And it's surely going to be faster.
>
> Attached is a draft patch for a new function that reports the set of PIDs
> directly blocking a given PID (that is, holding or awaiting conflicting
> locks on the lockable object it's waiting for).
>
> I replaced isolationtester's pg_locks query with this, and found that
> it's about 9X faster in a normal build, and 3X faster with
> CLOBBER_CACHE_ALWAYS turned on. That would give us some nice headroom
> for the isolation tests with CLOBBER_CACHE_ALWAYS animals. (Note that
> in view of
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2016-02-14%2007%3A38%3A37
> we still need to do *something* about the speed of the new deadlock-hard
> test; this patch could avoid the need to dumb down or slow down that test
> even further.)
>
> Not to be neglected also is that (I believe) this gives the right answer,
> whereas isolationtester's existing query is currently completely broken by
> parallel queries, and it doesn't understand non-conflicting lock modes
> either. (It did, at least partially, before commit 38f8bdcac4982215;
> I am not sure that taking out the mode checks was a good idea. But
> putting them back would make the query slower yet.)

The reason I took that out is because it breaks the deadlock-soft
test. It's possible to have a situation where no granted lock
conflicts with an awaited lock. If that happens, the old query
wrongly concluded that the waiting process was not in fact waiting.
(Consider A hold AccessShareLock, B awaits AccessExclusiveLock, C now
requests AccessShareLock and *waits*.)

As for the patch itself, I'm having trouble grokking what it's trying
to do. I think it might be worth having a comment defining precisely
what we mean by "A blocks B". I would define "A blocks B" in general
as either A holds a lock which conflicts with one sought by B
(hard-blocked) or A awaits a lock which conflicts with one sought by B
and precedes it in the wait queue (soft-blocked). I have wondered
before if we shouldn't modify pg_locks to expose the wait-queue
ordering; without that, you can't reliably determine in general
whether A soft-blocks B, which means every view anyone has ever
written over pg_locks that purports to say who blocks who is
necessarily buggy.

For parallel queries, there's a further relevant distinction when we
say "A blocks B". We might mean either that (1) process B cannot
resume execution until the lock conflict is resolved or (2) the group
leader for process B cannot complete the current parallel operation
until the lock conflict is resolved. If you're trying to figure out
why one particular member of a parallel group is stuck, you want to
answer question #1. If you're trying to figure out what all the
things that need to get out of the way to finish the query, you want
to answer question #2. I think this function is aiming to answer
question #2, not question #1, but I'm less clear on the reason behind
that definitional choice.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2016-02-17 15:04:54 Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.
Previous Message Tom Lane 2016-02-17 14:25:48 Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2016-02-17 14:56:26 Re: WIP: Access method extendability
Previous Message Artur Zakirov 2016-02-17 14:37:58 Re: Fuzzy substring searching with the pg_trgm extension