Re: parallel mode and parallel contexts

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-02-11 18:59:04
Message-ID: CA+TgmobhSd_14Fb6kV6GkSVK=7P-xwPqU5ZwtaZWZNMfVPnksA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 10, 2015 at 8:45 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Suggestions for a better name? CreateParallelContextForLoadableFunction?
>
> *ExternalFunction maybe? That's dfmgr.c calls it.

Done.

>> It can't just break, because clearing pcxt->worker[i].error_mqh is an
>> essential step. I could add a flag variable that tracks whether any
>> registrations have failed and change the if statement to if
>> (!any_registrations_failed && RegisterDynamicBackgroundWorker(&worker,
>> &pcxt->worker[i].bgwhandle)), if you want. I thought about doing that
>> but decided it was very unlikely to affect the real-world performance
>> of anything, so easier just to keep the code simple. But I'll change
>> it if you want.
>
> I think it'd be better.

Done.

>> This is a good question, but the answer is not entirely clear to me.
>> I'm thinking I should probably just remove
>> process_session_preload_libraries() altogether at this point. That
>> was added at a time when RestoreLibraryState() didn't exist yet, and I
>> think RestoreLibraryState() is a far superior way of handling this,
>> because surely the list of libraries that the parallel leader
>> *actually had loaded* is more definitive than any GUC.
>
> That sounds like a good idea to me.

Done.

> Well, it's pretty much never the case that the library is loaded before
> postgresql.conf gucs, right? A changed postgresql.conf is the only
> exception I can see. Neither is it the normal case for
> session|local_preload_libraries. Not even when GUCs are loaded via
> pg_db_role_setting or the startup packet...

Well, it can happen when you issue an explicit LOAD, or an implicit
one by calling a function that's in that module, and then go back and
set the GUC afterward, if nothing else.

> Anyway, I think this is a relatively minor issue.

I'm going to leave this alone for now. Evidence may emerge that this
is better done in the other order, but in the absence of evidence I'm
going to follow my gut rather than yours. With all due respect for
your gut, of course.

> The only reason I'd like it to be active is because that'd *prohibit*
> doing the crazier stuff. There seems little reason to not da it under
> the additional protection against crazy things that'd give us?

Trying to load additional libraries once parallel mode is in progress
can provide failures, because the load of the libraries causes the
system to do GUC_ACTION_SET on the GUCs whose initialization was
deferred, and that trips up the
no-changing-things-while-in-parallel-mode checks. I'm not sure if
there's anything we can, or should, do about that.

> Well, ExportSnapshot()/Import has quite a bit more restrictions than
> what you're doing... Most importantly it cannot import in transactions
> unless using read committed isolation, forces xid assignment during
> export, forces the old snapshot to stick around for the whole
> transaction and only works on a primary. I'm not actually sure it makes
> a relevant difference, but it's certainly worth calling attention to.
>
> The fuding I was wondering about certainly is unnecessary though -
> GetSnapshotData() has already performed it...
>
> I'm not particularly awake right now and I think this needs a closer
> look either by someone awake... I'm not fully convinced this is safe.

I'm not 100% comfortable with it either, but I just spent some time
looking at it and can't see what's wrong with it. Basically, we're
trying to get the parallel worker into a state that matches the
master's state after doing GetTransactionSnapshot() - namely,
CurrentSnapshot should point to the same snapshot on the master, and
FirstSnapshotSet should be true, plus the same additional processing
that GetTransactionSnapshot() would have done if we're in a higher
transaction isolation level. It's possible we don't need to mimic
that state, but it seems like a good idea.

Still, I wonder if we ought to be banning GetTransactionSnapshot()
altogether. I'm not sure if there's ever a time when it's safe for a
worker to call that.

>> > Also, you don't seem to
>> > prohibit popping the active snapshot (should that be prohibitted
>> > maybe?) which might bump the initiator's xmin horizon.
>>
>> I think as long as our transaction snapshot is installed correctly our
>> xmin horizon can't advance; am I missing something?
>
> Maybe I'm missing something, but why would that be the case in a read
> committed session? ImportSnapshot() only ever calls
> SetTransactionSnapshot it such a case (why does it contain code to cope
> without it?), but your patch doesn't seem to guarantee that.
>
> But as don't actually transport XactIsoLevel anywhere (omission
> probably?) that seems to mean that the SetTransactionSnapshot() won't do
> much, even if the source transaction is repeatable read.

Doesn't XactIsoLevel get set by assign_XactIsoLevel() when we restore
the GUC state?

> Now the thing that actually does give some guarantees is that you
> immediately afterwards restore the active snapshot and do a
> PushActiveSnapshot(), which'll prevent MyPgXact->xmin from changing
> because SnapshotResetXmin() refuses to do anything if there's an
> ActiveSnapshot. Seems a bit comlicated and fragile though.

Suggestions?

>> Terminate is 'X', not 'T'
>
> Oops, yes.
>
>> and it's a frontend-only message. The worker is speaking the backend
>> half of the protocol. We could use it anyway; that wouldn't be silly.
>
> Even if it's a frontend only one it doesn't seem like a bad idea.

Done.

>> I picked ReadyForQuery because that's
>> what the backend sends when it is completely done processing
>> everything that the user most recently requested, which seems
>> defensible here.
>
> I'm pretty sure that we're going to reuse workers within a parallel
> query at some point and ready for query seems like a nicer code for
> saying 'finished with my work, give me the next thing'.

Yeah, could well be.

> Hm. Or we could attach the pid to the error message in that case - just
> like there already is schema_name etc. Or, to stay more in the FE/BE
> vibe - we could just send a 'K' message at startup which sends
> MyProcPid, MyCancelKey in normal connections.

Done.

> Replacing "Only" by "When running as parallel worker we only place a
> ..." would already help. To me the comment currently readss like it
> desperately wishes to be located in the function initiating parallelism
> rather than global file scope. Maybe it's lonely or such.

Heh, heh. Done.

>> > * You're manually passing function names to
>> > PreventCommandIfParallelMode() in a fair number of cases. I'd either
>> > try and keep the names consistent with what the functions are actually
>> > called at the sql level (adding the types in the parens) or just use
>> > PG_FUNCNAME_MACRO to keep them correct.
>>
>> I think putting the type names in is too chatty; I'm not aware we use
>> that style in other error messages. We don't want to lead people to
>> believe that only the form with the particular argument types they
>> used is not OK.
>
>> PG_FUNCNAME_MACRO will give us the C name, not the SQL name.
>
> Your manual ones don't either, that's what made me
> complain. E.g. pg_advisory_lock_int8 isn't called that on the SQL
> level. Instead they use the C name + parens.

On further reflection, I think I should probably just change all of
those to use a general message about advisory locks, i.e.:

ERROR: cannot use advisory locks during a parallel operation

That sound good to you?

> Which is *not* a good example for the problem. Your primary reasoning
> for needing something more than sharing the locks that we know the
> individual query is going to need (which we acquire in the parse
> analzye/planner/executor) is that we can't predict what some random code
> does. Now, I still don't think that argument should hold too much sway
> because we'll only be able to run carefully controlled code
> *anyway*.

I'll avoid repeating what I've said about this before, except to say
that I still don't believe for a minute you can predict which locks
you'll need.

> But *if* we take it as reasoning for doing more than granting
> the locks we need for the individual query, we *still* need to cope with
> exactly the variants of the above invisible deadlock. You can still can
> call a function acquiring some form of lock on either side.

That by itself is not a deadlock. If the worker blocks trying to
acquire a lock that the master held before the start of parallelism,
it will wait forever, even if the master never acquires a lock. But
if the worker blocks trying to acquire a lock that the master acquired
*during* parallelism, it's presumably something like a catalog lock or
a tuple lock or a relation extension lock that the master is going to
release quickly. So the master will finish with that resource and
then the worker will go ahead. You're right that there are other
scenarios to consider, but you need more than that.

>> On the specific issues:
>>
>> 1. I agree that it's very dangerous for the parallel backend to
>> acquire the lock this way if the master no longer holds it.
>> Originally, I was imagining that there would be no interlock between
>> the master shutting down and the worker starting up, but you and
>> others convinced me that was a bad idea. So now transaction commit or
>> abort waits for all workers to be gone, which I think reduces the
>> scope of possible problems here pretty significantly. However, it's
>> quite possible that it isn't airtight. One thing we could maybe do to
>> make it safer is pass a pointer to the initiator's PGPROC. If we get
>> the lock via the fast-path we are safe anyway, but if we have to
>> acquire the partition lock, then we can cross-check that the
>> initiator's lock is still there. I think that would button this up
>> pretty tight.
>
> That'd certainly make me feel less bad about it.

Done.

>> 2. My reading of the group locking discussions was that everybody
>> agreed that the originally-proposed group locking approach, which
>> involved considering locks from the same locking group as mutually
>> non-conflicting, was not OK. Several specific examples were offered -
>> e.g. it's clearly not OK for two backends to extend a relation at the
>> same time just because the same locking group. So I abandoned that
>> approach. When I instead proposed the approach of copying only the
>> locks that the master *already* had at the beginning of parallelism
>> and considering *only those* as mutually conflicting, I believe I got
>> several comments to the effect that this was "less scary".
>> Considering the topic area, I'm not sure I'm going to do any better
>> than that.
>
> It surely is less scary, but still darn scary. It's just friggin hard to
> have a mental model where (self) exclusive locks suddenly aren't that
> anymore. It'll get more and more dangerous the more we start to relax
> restrictions around parallelism - and that *will* come.

Assuming we get a first version committed at some point, yes.

> This concern does not only apply to user level commands, but also our
> own C code. We will find more and more reasons to parallelize commands
> and if we will have problems with suddenly not having a chance to
> prevent parallelism during certain actions. Say we parallelize VACUUM
> (grand idea for the index scans!) and someone wants to also improve the
> truncation. Suddenly the AEL during truncation doesn't give us
> guarantees anymore

If the VACUUM leader process tries to grab an AccessExclusiveLock
after starting parallel mode and before terminating it, surely that's
going to conflict with the locks held by any remaining workers at that
point. Even if it weren't going to conflict, you'd have to be pretty
stupid to code it that way, because you can't remove dead line
pointers until you've finished all of the index scans, and you can't
very well truncate the relation until you've removed dead line
pointers. So to me, this seems like an impossible scenario that
wouldn't break even if it were possible.

> There'll be many more of these, and we can't really
> reason about them because we used to working locks.

FUD.

I think one thing that might help allay your concerns in this area to
a degree is to discuss under what circumstances we think it's safe to
create a parallel context in the first place. For example, if you
were to insert code that tries to go parallel in the middle of
heap_update() or in the middle of relation extension, that's probably
not going to work out well with this model, or with any model. It's
not *impossible* for it to work out well - if you wanted to extend the
relation by a whole bunch of blocks, the worker could do that while
the master goes and try to read from CLOG or something, but you'd have
to be awfully careful and it's probably a stupid idea for lots of
reasons. It really only makes sense to enter parallelism when the
backend is in a relatively quiescent state - like at the beginning of
a query, or after vacuum scans the heap but before it scans the
indexes. At that point it's a lot easier to reason about what the
parallel backends can safely do.

Broadly, I think that's there's a close connection between what state
the master is in when we initiate parallelism and what the workers can
safely do. I'm not sure exactly how to describe what the connection
is there, but I think it exists, and firming up the way we think about
that might make it easier to reason about this.

>> 3. I welcome proposals for other ways of handling this problem, even
>> if they restrict the functionality that can be offered. For example,
>> a proposal that would make parallel_count revert to single-threaded
>> mode but terminate without an indefinite wait would be acceptable to
>> me, provided that it offers some advantage in safety and security over
>> what I've already got.
>
> I think the above example where we only grant the locks required for a
> specific thing and only on the relevant severity would already be a big
> improvement. Even better would be to to use the computed set of locks to
> check whether workers could acquire them and refuse paralellism in that
> case.

I think that will just produce lots of very-hard-to-debug undetected
deadlocks and huge volumes of code that tries and fails to assess what
locks the worker will need.

> Another thing to do is to mark the lock, both in the master and workers,
> as not effectively being of the original severity anymore. If the own
> process again tries to acquire the lock on a heavier severity than what
> was needed at the time of query execution, error out. At least until
> parallel mode has confirmed to have ended. That should pretty much never
> happen.

I'm not sure what you mean by the "severity" of the lock. How about
marking the locks that the worker inherited from the parallel master
and throwing an error if it tries to lock one of those objects in a
mode that it does not already hold? That'd probably catch a sizeable
number of programming errors without tripping up many legitimate use
cases (and we can always relax or modify the prohibition if it later
turns out to be problematic).

> I don't see how we can avoid teaching the deadlock detector about all
> these silent deadlocks in any cases. By your own reasoning.

We're not seeing eye to eye here yet, since I don't accept your
example scenario and you don't accept mine. Let's keep discussing.

Meanwhile, here's an updated patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
parallel-mode-v5.patch application/x-patch 128.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-02-11 19:00:46 Re: pgbench -f and vacuum
Previous Message Josh Berkus 2015-02-11 18:42:55 Re: GSoC 2015 - mentors, students and admins.