Re: parallel mode and parallel contexts

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel mode and parallel contexts
Date: 2015-02-08 00:20:27
Message-ID: 20150208002027.GH9201@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-02-06 22:43:21 -0500, Robert Haas wrote:
> Here's v4, with that fixed and a few more tweaks.

If you attached files generated with 'git format-patch' I could directly
apply then with the commit message and such. All at once if it's
mutliple patches, as individual commits. On nontrivial patches it's nice
to see the commit message(s) along the diff(s).

Observations:
* Some tailing whitespace in the readme. Very nice otherwise.

* Don't like CreateParallelContextForExtension much as a function name -
I don't think we normally equate the fact that code is located in a
loadable library with the extension mechanism.

* StaticAssertStmt(BUFFERALIGN(PARALLEL_ERROR_QUEUE_SIZE) ...) what's
that about?

* the plain shm calculations intentionally use mul_size/add_size to deal
with overflows. On 32bit it doesn't seem impossible, but unlikely to
overflow size_t.

* I'd s/very // i "We might be running in a very short-lived memory
context.". Or replace it with "too".

* In +LaunchParallelWorkers, does it make sense trying to start workers
if one failed? ISTM that's likely to not be helpful. I.e. it should
just break; after the first failure.

* +WaitForParallelWorkersToFinish says that it waits for workers to exit
cleanly. To me that's ambiguous. How about "fully"?

* ParallelMain restores libraries before GUC state. Given that
librararies can, and actually somewhat frequently do, inspect GUCs on
load, it seems better to do it the other way round? You argue "We want
to do this before restoring GUCs, because the libraries might define
custom variables.", but I don't buy that. It's completely normal for
namespaced GUCs to be present before a library is loaded? Especially
as you then go and process_session_preload_libraries() after setting
the GUCs.

* Should ParallelMain maybe enter the parallel context before loading
user defined libraries? It's far from absurd to execute code touching
the database on library initialization...

* rename ParallelMain to ParallelWorkerMain?

* I think restoring snapshots needs to fudge the worker's PGXACT->xmin
to be the minimum of the top transaction id and the
snapshots. Otherwise if the initiating process dies badly
(e.g. because postmaster died) the workers will continue to work,
while other processes may remove things. Also, you don't seem to
prohibit popping the active snapshot (should that be prohibitted
maybe?) which might bump the initiator's xmin horizon.

* Comment about 'signal_handler' in +HandleParallelMessageInterrupt
is outdated.

* Is it really a good idea to overlay Z/ReadyForQuery with 'this worker
exited cleanly'? Shouldn't it at least be T/Terminate? I'm generally
not sure it's wise to use this faux FE/BE protocol here...

* HandlingParallelMessages looks dead.

* ParallelErrorContext has the wrong comment.

* ParallelErrorContext() provides the worker's pid in the context
message. I guess that's so it's easier to interpret when sent to the
initiator? It'll look odd when logged by the failing process.

* We now have pretty much the same handle_sigterm in a bunch of
places. Can't we get rid of those? You actually probably can just use
die().

* The comments in xact.c above XactTopTransactionId pretty much assume
that the reader knows that that is about parallel stuff.

* I'm a bit confused by the fact that Start/CommitTransactionCommand()
emit a generic elog when encountering a PARALLEL_INPROGRESS, whereas
ReleaseSavepoint()/RollbackTo has a special message. Shouldn't it be
pretty much impossible to hit a ReleaseSavepoint() with a parallel
transaction in progress? We'd have had to end the previous transaction
command while parallel stuff was in progress - right?
(Internal/ReleaseCurrentSubTransaction is different, that's used in code)

* Why are you deviating from the sorting order used in other places for
ParallelCurrentXids? That seems really wierd, especially as we use
something else a couple lines down. Especially as you actually seem to
send stuff in xidComparator order?

* I don't think skipping AtEOXact_Namespace() entirely if parallel is a
good idea. That function already does some other stuff than cleaning
up temp tables. I think you should pass in parallel and do the skip in
there.

* Start/DndParallelWorkerTransaction assert the current state, whereas
the rest of the file FATALs in that case. I think it'd actually be
good to be conservative and do the same in this case.

* You're manually passing function names to
PreventCommandIfParallelMode() in a fair number of cases. I'd either
try and keep the names consistent with what the functions are actually
called at the sql level (adding the types in the parens) or just use
PG_FUNCNAME_MACRO to keep them correct.

* Wait. You now copy all held relation held "as is" to the standby? I
quite doubt that's a good idea, and it's not my reading of the
conclusions made in the group locking thread. At the very least this
part needs to be extensively documented. And while
LockAcquireExtended() refers to
src/backend/access/transam/README.parallel for details I don't see
anything pertinent in there. And the function header sounds like the
only difference is the HS logging - not mentioning that it essentially
disables lock queuing entirely.

This seems unsafe (e.g. consider if the initiating backend died and
somebody else acquired the lock, possible e.g. if postmaster died) and
not even remotely enough discussed. I think this should be removed
from the patch for now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2015-02-08 01:05:46 Re: PATCH: Reducing lock strength of trigger and foreign key DDL
Previous Message Jeff Davis 2015-02-08 00:08:08 9.6 Feature help requested: Inclusion Constraints