Re: parallel mode and parallel contexts

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-04-29 16:23:48
Message-ID: CA+TgmoYL4mXLZe6b9x68+z8SgHFuDiS0frwqDvAP_T8nD0HGTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 28, 2015 at 2:38 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Apr 26, 2015 at 2:03 PM, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
>> On 19-03-2015 15:13, Robert Haas wrote:
>>> On Wed, Mar 18, 2015 at 5:36 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>>> Reading the README first, the rest later. So you can comment on my
>>>> comments, while I actually look at the code. Parallelism, yay!
>>>
>> I'm also looking at this piece of code and found some low-hanging fruit.
>
> Thanks. 0001 and 0003 look good, but 0002 is actually unrelated to this patch.

So, I think it makes sense to split up this patch in two. There's no
real debate, AFAICS, about anything in the patch other than the
heavyweight locking stuff. So I'd like to go ahead and commit the
rest. That's attached here as parallel-mode-v10.patch.

The remaining question here is what to do about the heavyweight
locking. There seem to be a couple of possible approaches to that,
and perhaps with this specific issue separated out we can focus in on
that aspect of the problem. Let me start by restating my goals for
that portion of the patch:

1. Avoid undetected deadlocks. For example, suppose A and A1 are
cooperating parallel processes, and B is an unrelated process. If A1
waits for a lock held by B, and B waits for a lock held by A, and A is
waiting for A1 to finish executing before winding up parallelism, the
deadlock detector currently won't detect that. Either the deadlock
detector needs to know that A waits for A1, or it needs to view the
lock held by A as also held by A1, so that it can detect a cycle A1 ->
B -> A1.

2. Avoid unprincipled deadlocks. For example, suppose A holds
AccessExclusiveLock on a relation R and attempts a parallel sequential
scan on R. It launches a worker A1. If A1 attempts to lock R and
blocks, and A then waits for A1, we have a deadlock. Detecting this
deadlock (as per goal 1) is better than not detecting it, but it's not
good enough. Either A shouldn't have attempted a parallel sequential
scan in the first place, or it should run to completion without
deadlocking with its own workers.

3. Allow parallel DDL. For example, parallel VACUUM FULL or parallel
CLUSTER. We don't have code for these things today, but the locking
architecture we choose should not preclude doing them later.

4. Avoid restricting parallelism based on the history of the
transaction. If BEGIN; COPY foo FROM ...; SELECT ... FROM foo can use
parallelism, then BEGIN; TRUNCATE foo; COPY foo FROM ...; SELECT ...
FROM foo should also be able to use parallelism.

5. Impose as few restriction as possible on what can be done in
parallel mode. For example, the current version of this patch
(attached as parallel-mode-locking.patch) allows parallel workers to
do system cache lookups, where a relation lock is taken and released,
but they cannot take and retain any heavyweight locks; so for example
a user-defined function that runs on SQL query against some unrelated
table and returns the results isn't parallel-safe right now. This is
not ideal.

That's all I've got for goals. How do we achieve those goals? So far
I think we've hit on three approaches to deadlock detection that I
think are at least somewhat plausible. None of them are perfect:

D-1. Teach the deadlock detector about implicit edges between
cooperating parallel processes. Upthread, I wrote about some problems
with this approach:
http://www.postgresql.org/message-id/CA+TgmoZ0tOPvfuz5BmEUqmdQ9xYQHzkG4jjJXduYDgQJXmPFbQ@mail.gmail.com

D-2. Copy locks from the master to every worker, and forbid workers
from acquiring any new, long-lived locks. (Locks that are taken and
released quickly, like a system catalog lock for a syscache lookup, or
a relation extension lock, won't cause deadlock.) Copying locks
ensures that whenever there is a dangerous structure like A1 -> B -> A
in the waits-for graph, there will also be a cycle A1 -> B -> A1, so
the unmodified deadlock detector will detect the deadlock. Forbidding
workers from acquiring any new, long-lived locks prevents locks taken
after the copying step from causing similar problems. The restriction
against new, long-lived locks is annoying. Also, a transaction
holding many locks (like pg_dump) and using many workers will use an
enormous number of lock table slots.

D-3. Teach the deadlock detector to consider all locks held by any
member of the locking group as held by the leader. This guarantees
that all deadlocks between the parallel group and other processes in
the system will be detected. When a process in the group requests a
lock that conflicts with one already held by another group member, but
not with any held by a process outside the group, grant it anyway, and
without waiting behind pending conflicting lock requests. At the end
of parallelism, transfer any locks held by workers and not released
back to the leader. In a world where lock requests within a group
can't conflict, there's no such thing as a deadlock within a parallel
group, so we don't need to detect deadlocks of that type, and there
will never be any unprincipled deadlocks within a group because there
will never be any deadlocks within a group at all.

The biggest problem with this approach is figuring out how to make it
safe. I originally proposed this approach on the "group locking"
thread and it kind of went down in flames. If two parallel backends
are trying to extend the same relation at the same time, or lock the
same tuple or page at the same time, those requests have got to
conflict. In those cases, we are using locking to enforce real mutual
exclusion. However, that doesn't seem like a serious problem. First,
locks of those types will not be held at the start of parallelism;
starting parallelism while you hold a lock of one of those types would
be ridiculous. Second, locks of those types are never long-lived: you
take them, do what you need to do, and then release them. Finally, I
don't believe you ever take any other heavyweight locks while holding
them; I *think* they're always the last heavyweight lock you take. So
I think we could allow these locks to conflict normally without
introducing any deadlock risk. The only locks that we need to
consider as mutually non-conflicting are relation and database object
locks. Those are a different kettle of fish: when we change a
relation's relfilenode, for example, we must keep the relation locked
for the duration of the transaction because of MVCC concenrs. The new
pg_class tuple isn't visible to anyone else yet, and any further
modifications to the relation must be done using the new relfilenode.
But none of that matters for a parallel worker, which shares the same
snapshot. There are still cases that do matter; for example, if a
parallel backend could REINDEX a backend being concurrently scanned by
another parallel backend in the same group, that would cause problems,
because REINDEX uses backend-local state that wouldn't be shared. But
these cases can also arise without parallelism, due to cursors, and
there is an existing mechanism (CheckTableNotInUse) to prevent them.
So I think that's OK, too. If not, we can fix other problems as we
find them; I don't see any major architectural problems here.

The other big problem with this approach is figuring out how to
implement it. It doesn't work to have the worker processes manipulate
the leader's lock-related data structures as if they were the parent,
both because the locking code relies on the shared data structures to
match its backend-private data structures and also because if we do
block waiting for a lock, there could be more than one waiting process
in the lock group at a time, and we need to know which specific
processes are waiting so we can wake them up at the correct time.
Instead, I think the way to do this is to add an additional field to
each PROCLOCK storing a pointer to the leader's PGPROC;
FindLockCycleRecurse() can check whether two PGPROCs have the same
leader pointer instead of whether the PGPROCs themselves are the same.
It can also refuse to follow a waits-for edge if that edge is of one
of the types we do want to conflict within a locking group (e.g.
relation extension).

---

I'm OK with any of these approaches if we can agree on how to solve
the problems they respectively present. Currently, I've got an
implementation of D-2; I started with an implementation of D-3, which
was flawed, but perhaps the idea is salveable, per the discussion
above. Andres has consistently advocated for D-1, but see the link
above for where I'm stuck in terms of implementing that. There may be
another approach we haven't come up with yet, too, for all I know, and
that's fine too if it accomplishes the goals listed above.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
parallel-mode-v10.patch binary/octet-stream 127.8 KB
parallel-mode-locking.patch binary/octet-stream 22.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Devrim Gündüz 2015-04-29 16:24:12 Re: Help needed for PL/Ruby
Previous Message Alvaro Herrera 2015-04-29 16:14:44 Re: Can pg_dump make use of CURRENT/SESSION_USER