Re: parallel mode and parallel contexts

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-03-24 19:26:28
Message-ID: 20150324192628.GG15229@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-03-19 14:13:59 -0400, Robert Haas wrote:
> On Wed, Mar 18, 2015 at 5:36 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Reading the README first, the rest later. So you can comment on my
> > comments, while I actually look at the code. Parallelism, yay!
>
> Sorry, we don't support parallelism yet. :-)

And not even proper sequential work apparently...

> >> + - The currently active user ID and security context. Note that this is
> >> + the fourth user ID we restore: the initial step of binding to the correct
> >> + database also involves restoring the authenticated user ID. When GUC
> >> + values are restored, this incidentally sets SessionUserId and OuterUserId
> >> + to the correct values. This final step restores CurrentUserId.
> >
> > Ah. That's the answer for above. Could you just move it next to the
> > other user bit?
>
> Well, I think it's good to keep this in the same order it happens.
> That's almost true, with the exception of the libraries, which were
> out of order. (I've fixed that now.)

I don't find this convincing in the least. This should explain a
developer which state he can rely on being shared. For that a sane order
is helpful. In which precise order this happens doesn't seem to matter
for that at all; it's also likely ot get out of date.

> >> +At the end of a parallel operation, which can happen either because it
> >> +completed successfully or because it was interrupted by an error, parallel
> >> +workers associated with that operation exit. In the error case, transaction
> >> +abort processing in the parallel leader kills of any remaining workers, and
> >> +the parallel leader then waits for them to die.
> >
> > Very slightly awkward because first you talk about successful *or* error
> > and then about abort processing.
>
> I don't understand what's awkward about that. I make a general
> statement about what happens at the end of a parallel operation, and
> then the next few sentences follow up by explaining what happens in
> the error case, and what happens in the success case.

I seem to have completely overread the "In the error case, " part of the
sentence. Forget what I wrote.

> >> +Copying locks to workers is important for the avoidance of undetected
> >> +deadlock between the initiating process and its parallel workers. If the
> >> +initiating process holds a lock on an object at the start of parallelism,
> >> +and the worker subsequently attempts to acquire a lock on that same object
> >> +and blocks, this will result in an undetected deadlock, because the
> >> +initiating process cannot finish the transaction (thus releasing the lock)
> >> +until the worker terminates, and the worker cannot acquire the lock while
> >> +the initiating process holds it. Locks which the processes involved acquire
> >> +and then release during parallelism do not present this hazard; they simply
> >> +force the processes involved to take turns accessing the protected resource.
> >
> > I don't think this is a strong guarantee. There very well can be lack of
> > forward progress if they're waiting on each other in some way. Say the
> > master backend holds the lock and waits on output from a worker. The
> > worker then will endlessly wait for the lock to become free. A
> > deadlock. Or, as another scenario, consider cooperating backends that
> > both try to send tuples to each other but the queue is full. A deadlock.
>
> The idea is that both the master and the workers are restricted to
> locks which they take and release again. The idea is, specifically,
> to allow syscache lookups. Taking a lock and then waiting for the
> worker is no good, which is why WaitForParallelWorkersToFinish() calls
> CheckForRetainedParallelLocks(). That would catch your first
> scenario.
>
> Your second scenario seems to me to be a different kind of problem.
> If that comes up, what you're going to want to do is rewrite the
> workers to avoid the deadlock by using non-blocking messaging.

I don't think that actually really solves the problem. You'll still end
up blocking with full "pipes" at some point. Whether you do it inside
the messaging or outside the messaging code doesn't particularly matter.

> In any case, I don't think it's really this patch's job to prevent
> deadlocks in code that doesn't exist today and in no way involves the
> lock manager.

Well, you're arguing that we need a solution in the parallelism
infrastructure for the lock deadlocks problem. I don't think it's absurd
to extend care to other cases.

> > To me it seems the deadlock detector has to be enhanced to be able to
> > see 'waiting for' edges. Independent on how we resolve our difference of
> > opinion on the copying of locks.
> >
> > It seems to me that this isn't all that hard: Whenever we block waiting
> > for another backend we enter ourselves on the wait queue to that
> > backend's virtual transaction. When finished we take the blocking
> > backend off. That, afaics, should do it. Alternatively we can just
> > publish what backend we're waiting for in PGPROC and make deadlock also
> > look at that; but, while slightly cleaner, that looks like being more
> > invasive.
>
> That's an interesting idea. It would be more flexible than what I've
> got here right now, in that parallel backends could take and retain
> locks on arbitrary objects, and we'd only error out if it actually
> created a deadlock, instead of erroring out because of the potential
> for a deadlock under some unlikely circumstances.

It also seems like it'd be able to deal with a bunch of scenarios that
the current approach wouldn't be able to deal with.

> But it can't be
> done with existing lock manager APIs - right now there is no way to
> put yourself on a wait queue for a virtual transaction except to try
> to acquire a conflicting lock, and that's no good because then you
> aren't actually trying to read data from it.

Right.

> You'd need some kind of
> API that says "pretend I'm waiting for this lock, but don't really
> wait for it", and you'd need to be darn sure that you removed yourself
> from the wait queue again before doing any other heavyweight lock
> manipulation. Do you have specific thoughts on how to implement this?

I've thought some about this, and I think it's a bit easier to not do it
on the actual lock waitqueues, but teach deadlock.c about that kind of
blocking.

deadlock.c is far from simple, and at least I don't find the control
flow to be particularly clear. So it's not easy. It'd be advantageous
to tackle things at that level because it'd avoid the need to acquire
locks on some lock's waitqueue when blocking; we're going to do that a
lot.

But It seems to me that it should be possible to suceed: In
FindLockCycleRecurse(), in the case that we're not waiting for an actual
lock (checkProc->links.next == NULL) we can add a case that considers
the 'blocking parallelism' case. ISTM that that's just a
FindLockCycleRecurse() call on the process that we're waiting for. We'd
either have to find the other side's locktag for DEADLOCK_INFO or invent
another field in there; but that seems like a solveable problem.

> > Am I missing something or does the copying currently break deadlock.c?
> > Because afaics that'll compute lock conflicts in FindLockCycleRecurse()
> > without being aware of the conflicting lock being granted to two
> > backends. Won't this at least trigger spurious deadlocks? It might
> > happen to be without consequence for some reason, but this would, at the
> > very least, need some very careful review.
>
> There may be a problem here, but I'm not seeing it. Initially, we're
> blocking on a lock that we don't already hold, so it's not one of the
> copied ones. If we follow one or more waits-for edges and arrive back
> at a copied lock, that's a real deadlock and should be reported as
> such.

That's a fair point. I was worried that this is going to introduce
additional hard edges between processes. I think it actually might, but
only when a lock is upgraded; which really shouldn't happen for the
copied locks.

Also: Man, trying to understand the guts of deadlock.c only made me
understand how *friggin* expensive deadlock checking is. I'm really
rather surprised that it only infrequently causes problems. I think I
have seen a report or two where it might have been the deadlock check
that went bonkers during schema upgrades on larger setups, but that's
it.

I'm not sure if I said that somewhere before: If we aborted parallelism
if any of the to-be-copied locks would conflict with its copy, instead
of duplicating them, I could live with this. Then this would amount to
jumping the lock queue, which seems reasonable to me. Since not being
able to do parallelism already needs to be handled gracefull, this
doesn't seem to be too bad.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2015-03-24 19:28:45 Re: Order of enforcement of CHECK constraints?
Previous Message Bruce Momjian 2015-03-24 19:08:26 Re: Zero-padding and zero-masking fixes for to_char(float)