group locking: incomplete patch, just for discussion

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: group locking: incomplete patch, just for discussion
Date: 2014-10-15 04:03:45
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

For parallelism, I think we need a concept of group locking. That is,
suppose we have a user backend and N worker backends collaborating to
execute some query. For the sake of argument, let's say it's a
parallel CLUSTER, requiring a full table lock. We need all of the
backends to be able to lock the table even though at least one of them
holds AccessExclusiveLock. This suggests that the backends should all
be members of a locking group, and that locks within the same locking
group should be regarded as mutually non-conflicting.

At least to me, that simple scenario is clear-cut[1], but what do we
do in more complicated situations? For example, suppose backends A
and B are members of the same locking group. A locks a relation with
AccessShareLock, an unrelated process X queues up waiting for an
AccessExclusiveLock, and then B also requests AccessShareLock. The
normal behavior here is that B should wait for X to go first, but here
that's a problem. If A is the user backend and B is a worker backend,
A will eventually wait for B, which is waiting for X, which is waiting
for A: deadlock. If it's the other way around, we might survive, but
at the very best it's undesirable to have "parallelism" where A and B
are actually completely serialized, with X taking its turn in the
middle. Now, in practical cases, we probably expect that if A and B
lock the same relation, they're going to do so in very quick
succession, so the actual behavior of the lock manager shouldn't
matter much in terms of fairness. But since we can't predict what
other processes may do in the window between those lock acquisitions,
a sound theoretical approach is needed to avoid deadlocks.

After some thought I've come to the conclusion that we should assume
that every process in a locking group will eventually wait for every
other process in that locking group. It follows that once any single
process in a locking group obtains any lock on an object, any future
lock request by another lock group member on that object should, if
non-conflicting, be granted immediately. It also follows that, when
processes in a locking group are waiting for a lock on an object, none
should be awoken until all lock requests have been satisfied. For
example, suppose process X holds AccessExclusiveLock on a relation.
First, process Y waits for AccessShareLock. Then, processes A and B,
members of the same locking group, queue up respectively for
AccessShareLock and AccessExclusiveLock, in that order. Next, process
X releases its lock. At this point, we should wake Y *but not A*; A
should continue to wait until we can grant locks to both A and B. The
reason is that a new process Q might come along and, for purposes of
deadlock avoidance, we might need to reorder the lock queue. Putting
Q ahead of both A and B is fine; but putting it between A and B is bad
for the reasons discussed in the previous paragraph.

Attached is an incomplete draft patch. It modifies
LockAcquireExtended() and ProcSleep() but not ProcWakeup() or the
deadlock detector. Aside from being incomplete, which is a problem in
and of itself, I don't have a very good idea how to comprehensively
test something like this. I can of course cobble together some test
code for particular scenarios, but it's not at all clear to me what a
comprehensive test plan would look like. Any ideas on that point
would be particularly appreciated, as would feedback on the overall
design. A lot more work is clearly needed here, but I've been known
to advocate soliciting feedback on proposed patches early, so here I
am taking my own advice.

Robert Haas
The Enterprise PostgreSQL Company

[1] You could argue that one process instead should take locks on
behalf of the group. But I think this is a terrible approach. First,
it means that whichever process is holding the locks, presumably the
user backend, had better not try to do any real computation of its
own, so that it can timely service lock requests, which seems like a
huge loss of efficiency; if the optimum degree of parallelism is 2,
we'll need 50% more processes and the whole result set will have to be
copied an extra time. Yuck. Second, it means that the lock-holding
process must refuse to die until all of the processes on behalf of
which it is holding locks don't need them any more. And I hate
backend processes that refuse to die with a fiery passion that cannot
be quenched.

Attachment Content-Type Size
group-locking-v0.patch text/x-patch 32.1 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-10-15 04:13:22 Re: group locking: incomplete patch, just for discussion
Previous Message Amit Kapila 2014-10-15 03:34:06 Re: Wait free LW_SHARED acquisition - v0.9