Re: group locking: incomplete patch, just for discussion

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: group locking: incomplete patch, just for discussion
Date: 2014-11-19 16:03:12
Message-ID: CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> To reiterate the basic problem here, if we do nothing at all about the
>> lock manager, a parallel backend can stall trying to grab an
>> AccessExclusiveLock that the user backend alread holds, either because
>> the user backend holds an AccessExclusiveLock as well, or because some
>> other process is waiting for one, we'll deadlock and not notice.
>
> My feeling is that we should keep the concept of a group and group
> leader from your patch, and improve the deadlock detector to make use of
> that information (looking at the code, it looks doable but not trivial).
> But unless I am missing something, we should separate out the lock
> sharing, and defer that until later.

Doing as you propose solves this problem:

1. Backend A-1 acquires AccessShareLock on relation R.
2. Backend B waits for AccessExclusiveLock on relation R.
3. Backend A-2 waits for AccessShareLock on relation R.

Knowing that A-1 and A-2 are related, the deadlock detector can move
A-2 ahead of B and grant the lock. All is well.

But your proposal does not solve this problem:

1. Backend A-1 acquires AccessExclusiveLock on relation R.
2. Backend A-2 waits for AccessShareLock on relation R.

The good news is that the deadlock detector should realize that since
A-1 and A-2 are in the same group, this is a deadlock. And it can
abort either A-1 or A-2, which will eventually abort them both. The
bad news, to borrow a phrase from Peter Geoghegan, is that it's an
unprincipled deadlock; a user confronted with the news that her
parallel scan has self-deadlocked will be justifiably dismayed. It's
been proposed by Andres, and maybe a few other people, that we can
detect this situation and just not use parallelism in these cases, but
that's actually quite hard to do.

Of course, it's pretty easy for a backend A-1 contemplating parallel
scan of relation R to notice that it holds a lock that conflicts with
the AccessShareLock it expects a prospective parallel backend A-2 to
attempt to acquire. I don't think there's a lock manager method for
that today, but we could easily add one. However, A-1 might
incidentally hold other locks (from earlier statements in the
transaction) that conflict with locks that will be sought by A-2, and
as discussed *extensively* upthread, it is not easy to predict all of
the locks a parellel backend might try to take: even seemingly-trivial
code like the equality operators for built-in data types can sometimes
take relation locks, and we have no infrastructure for predicting
which ones.

Even if you could somehow solve that problem, there's also the issue
of plan caching. If we make a parallel plan for an SQL statement
inside a PL/pgsql procedure because our backend holds no strong locks,
and then we acquire a strong lock on a relevant relation, we have to
invalidate that plan. I think that will add complexity inside the
plan cache machinery. It seems to me that it would be better to
handle these issues inside the lock manager rather than letting them
creep into other parts of the system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-11-19 16:04:49 Re: Add shutdown_at_recovery_target option to recovery.conf
Previous Message Abhijit Menon-Sen 2014-11-19 15:58:11 Re: What exactly is our CRC algorithm?