Re: parallel mode and parallel contexts

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-05-07 13:47:39
Message-ID: CA+TgmoZ=eapnv-yvfg+wxLyftXnRchr2MtEeT+NKdVSL70B3BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 7, 2015 at 3:09 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> Each individual queue has only a single reader and a single writer.
>> In your example, there would be three queues, not just one. Of
>> course, one could design a new shared memory data structure
>> representing a collection of related queues, but I still don't see
>> exactly how we get around the problem of that requiring
>> O(MaxBackends^2) storage space.
>
> If each backend waits on a uniformly-distributed 50% of other backends,
> tracking that wait graph indeed requires O(MaxBackends^2) space. Backends
> completing useful work in the field will have far-sparser wait graphs, and
> that should inform the choice of data structure:

We can, of course, underallocate and hope for the best.

>> > M
>> > subscribes to queue0-nonempty, and each of W1,W2,W3 publish to it. M
>> > publishes to queue0-nonfull, and the workers subscribe. An edge forms in the
>> > deadlock detector's graph when all of an event's subscribers wait on it. (One
>> > can approximate that in userspace with a pair of advisory locks.)
>>
>> An edge between which two processes?
>
> I hadn't de-fuzzed my thinking that far yet. If you still need the answer
> after this message, let me know, and I'll work on that. As a guess, I think
> it's three edges M-W1, M-W2 and M-W3.

I think it's an edge where one end is M and the other end is fuzzily
diffused across W1, W2, and W3, because we only need one of them to
wake up. There's no way to represent such a thing in the deadlock
detector today. We could invent one, of course, but it sounds
complicated. Also, I suspect deadlock checking in this world reduces
to http://en.wikipedia.org/wiki/Boolean_satisfiability_problem and is
therefore NP-complete.

> I see what you had in mind. The problem showed up in the last sentence;
> namely, the hand-off from process to process is not atomic like it is for
> heavyweight locks. That's exactly what I was (not too clearly) getting at
> when I wrote, "This assumes ..." above. For the sake of illustration, suppose
> you replace your queue in this algorithm with a heavyweight lock and a small
> buffer. Initially, B holds the lock and A waits for the lock. The following
> event sequence repeats until the transfer is done:
>
> B fills the buffer
> B releases the lock, granting it to A during LockRelease()
> B starts waiting for the lock
> A empties the buffer
> A releases the lock, granting it to B during LockRelease()
> A starts waiting for the lock
>
> The deadlock detector will rightly never report a deadlock. To have the same
> safety in your example, it's not enough for one process to set the latch of
> another process. The process must update something in static shared memory to
> indicate that the waited-for condition (queue-nonempty for A, queue-nonfull
> for B) is now satisfied. You need such a unit of state anyway, don't you?

Yes. It's the number of bytes read and written, and it's stored in
dynamic shared memory. You can move some of the state to the main
shared memory segment, but I don't look forward to the performance
consequences of a lock acquire & release cycle every time the queue
fills or drains. Of course, there is also a loss of flexibility
there: a big reason for developing this stuff in the first place was
to avoid being constrained by the main shared memory segment.

> The number of held LWLocks is always zero when entering user-defined code and
> again when exiting user-defined code. Therefore, the possible LWLock wait
> graphs are known at compile time, and we could prove that none contain a
> deadlock. Therefore, we scarcely miss having an LWLock deadlock detector.
> That does not map to the tuple queue behavior at hand, because we hope to
> allow the queue's writer to run user-defined code while the queue's reader is
> waiting. My term "user-defined code" does come with some hand-waving. A
> superuser can cause an undetected deadlock via a C-language hook. We would
> not call that a PostgreSQL bug, though the hook is literally user-defined.
> Let's keep the deadlock detector able to identify every deadlock a
> non-superuser can cause. That includes deadlocks arising from heavyweight
> lock acquisition in parallel-safe functions.

I'm in agreement with that goal.

>> I think where this project went off the rails is when we made the
>> decision as to how to fix the problems in the original group locking
>> approach. In the original concept, locks between processes in the
>> same parallel group just didn't ever conflict; two
>> otherwise-conflicting locks held by different backends within the same
>> group were regarded as those processes sharing that lock. Andres and
>> possibly others pointed out that for stuff like relation locks, that's
>
> I think you mean "relation extension locks".

Yes, thanks.

>> probably not the right behavior. I agree with that. It was also
>> suggested that this would be less scary if we limited ourselves to
>> sharing the locks held at the beginning of parallelism. That had the
>> further advantage of making things like relation extension locks,
>> which won't be held at the starting of paralellism, unshared, while
>> relation locks would, in most cases, be shared. That felt pretty
>> good, so I did it, but I now think that was a mistake, because it
>> creates edge cases where parallel groups can self-deadlock. If we
>> instead regard relation locks between parallel group members as NEVER
>> conflicting, rather than conflicting only when both locks were
>> acquired after the start of parallelism, those edge cases go away.
>
> Yes. Then you're back to something like the LWLock scenario, where an
> undetected deadlock implies a PostgreSQL bug. That's a good place to be.

Good.

>> The only downside is that if a worker A manages to do something to a
>> relation R that makes it unsafe for worker B to access, and worker B
>> then gets the lock anyway, we get no-holds-barred insanity instead of
>> a deadlock. But I am unconvinced that downside amounts to much,
>> because I believe the only way that A can make it unsafe for B to
>> access the relation is by doing something that CheckTableNotInUse()
>> will already prohibit in parallel mode. If it turns out there is some
>> oversight there, I don't think it'll be hard to plug.
>
> This is the bottom line, and I agree. I wrote more about that here:
> http://www.postgresql.org/message-id/20141226040546.GC1971688@tornado.leadboat.com
>
> I admit that it's alarming to have different conflict semantics by locktag.
> The tension originates, I think, from e.g. LOCKTAG_RELATION_EXTEND serving two
> distinct purposes simultaneously. As I wrote in the mail just cited, before
> altering a table in a manner that threatens concurrent usage, one must ensure
> that (a) other transactions and (b) other parts of my own transaction aren't
> in the way. Customarily, a heavyweight lock rules out (a), and
> CheckTableNotInUse() rules out (b). Relation extension does not have or need
> a CheckTableNotInUse() call or similar, because it doesn't call arbitrary code
> that might reenter the relation extension process. Under parallelism, though,
> it will be possible for multiple processes of a given transaction to attempt
> relation extension simultaneously. So we need a (b) test. It just so happens
> that allowing LOCKTAG_RELATION_EXTEND to conflict in a parallel group causes
> that lock acquisition to suffice for both purposes (a) and (b). That's neat
> but arguably impure. Every cure I've pondered has been worse than the
> disease, though. It will be okay.

That's my opinion as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-05-07 13:58:04 Re: initdb start server recommendation
Previous Message Stephen Frost 2015-05-07 13:44:05 "Bugs" CF?