Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-09-29 17:19:19
Message-ID: CA+TgmobNgjHESubuEFcimi9PSUuLjga52PgxKGe-6OHsaLtN=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 29, 2015 at 12:39 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Attached patch is a rebased patch based on latest commit (d1b7c1ff)
> for Gather node.
>
> - I have to reorganize the defines in execParallel.h and .c. To keep
> ParallelExecutorInfo, in GatherState node, we need to include execParallel.h
> in execnodes.h which was creating compilation issues as execParallel.h
> also includes execnodes.h, so for now I have defined ParallelExecutorInfo
> in execnodes.h and instrumentation related structures in instrument.h.
> - Renamed parallel_seqscan_degree to degree_of_parallelism
> - Rename Funnel to Gather
> - Removed PARAM_EXEC parameter handling code, I think we can do this
> separately.
>
> I have to work more on partial seq scan patch for rebasing it and handling
> review comments for the same, so for now I am sending the first part of
> patch (which included Gather node functionality and some general support
> for parallel-query execution).

Thanks for the fast rebase.

This patch needs a bunch of cleanup:

- The formatting for the GatherState node's comment block is unlike
that of surrounding comment blocks. It lacks the ------- dividers,
and the indentation is not the same. Also, it refers to
ParallelExecutorInfo by the type name, but the other members by
structure member name. The convention is to refer to them by
structure member name, so please do that.

- The naming of fs_workersReady is inconsistent with the other
structure members. The other members use all lower-case names,
separating words with dashes, but this one uses a capital letter. The
other members also don't prefix the names with anything, but this uses
a "fs_" prefix which I assume is left over from when this was called
FunnelState. Finally, this doesn't actually tell you when workers are
ready, just whether they were launched. I suggest we rename this to
"any_worker_launched".

- Instead of moving the declaration of ParallelExecutorInfo, please
just refer to it as "struct ParallelExecutorInfo" in execnodes.h.
That way, you're not sucking these includes into all kinds of places
they don't really need to be.

- Let's not create a new PARALLEL_QUERY category of GUC. Instead,
let's the GUC for the number of workers with under resource usage ->
asynchronous behavior.

- I don't believe that shm_toc *toc has any business being part of a
generic PlanState node. At most, it should be part of an individual
type of PlanState, like a GatherState or PartialSeqScanState. But
really, I don't see why we need it there at all. It should, I think,
only be needed during startup to dig out the information we need. So
we should just dig that stuff out and keep pointers to whatever we
actually need - in this case the ParallelExecutorInfo, I think - in
the particular type of PlanState node that's at issue - here
GatherState. After that we don't need a pointer to the toc any more.

- I'd like to do some renaming of the new GUCs. I suggest we rename
cpu_tuple_comm_cost to parallel_tuple_cost and degree_of_parallelism
to max_parallel_degree.

- I think that a Gather node should inherit from Plan, not Scan. A
Gather node really shouldn't have a scanrelid. Now, admittedly, if
the only thing under the Gather is a Partial Seq Scan, it wouldn't be
totally bonkers to think of the Gather as scanning the same relation
that the Partial Seq Scan is scanning. But in any more complex case,
like where it's scanning a join, you're out of luck. You'll have to
set scanrelid == 0, I suppose, but then, for example, ExecScanReScan
is not going to work. In fact, as far as I can see, the only way
nodeGather.c is actually using any of the generic scan stuff is by
calling ExecInitScanTupleSlot, which is all of one line of code.
ExecEndGather fetches node->ss.ss_currentRelation but then does
nothing with it. So I think this is just a holdover from early
version of this patch where what's now Gather and PartialSeqScan were
a single node, and I think we should rip it out.

- On a related note, the assertions in cost_gather() are both bogus
and should be removed. Similarly with create_gather_plan(). As
previously mentioned, the Gather node should not care what sort of
thing is under it; I am not interested in restricting it to baserels
and then undoing that later.

- For the same reason, cost_gather() should refer to it's third
argument as "rel" not "baserel".

- Also, I think this stuff about physical tlists in
create_gather_plan() is bogus. use_physical_tlist is ignorant of the
possibility that the RelOptInfo passed to it might be anything other
than a baserel, and I think it won't be happy if it gets a joinrel.
Moreover, I think our plan here is that, at least for now, the
Gather's tlist will always match the tlist of its child. If that's
so, there's no point to this: it will end up with the same tlist
either way. If any projection is needed, it should be done by the
Gather node's child, not the Gather node itself.

- Let's rename DestroyParallelSetupAndAccumStats to
ExecShutdownGather. Instead of encasing the entire function in if
statement, let's start with if (node->pei == NULL || node->pei->pcxt
== NULL) return.

- ExecParallelBufferUsageAccum should be declared to take an argument
of type PlanState, not Node. Then you don't have to cast what you are
passing to it, and it doesn't have to cast before calling itself. And,
let's also rename it to ExecShutdownNode and move it to
execProcnode.c. Having a "shutdown phase" that stops a node from
asynchronously consuming additional resources could be useful for
non-parallel node types - especially ForeignScan and CustomScan. And
we could eventually extend this to be called in other situations, like
when a Limit is filled give everything beneath it a chance to ease up.
We don't have to do those bits of work right now but it seems well
worth making this look like a generic facility.

- Calling DestroyParallelSetupAndAccumStats from ExplainNode when we
actually reach the Gather node is much too late. We should really be
shutting down parallel workers at the end of the ExecutorRun phase, or
certainly no later than ExecutorFinish. In fact, you have
standard_ExecutorRun calling ExecParallelBufferUsageAccum() but only
if queryDesc->totaltime is set. What I think you should do instead is
call ExecShutdownNode a few lines earlier, before shutting down the
tuple receiver, and do so unconditionally. That way, the workers are
always shut down in the ExecutorRun phase, which should eliminate the
need for this bit in explain.c.

- The changes to postmaster.c and postgres.c consist of only
additional #includes. Those can, presumably, be reverted.

Other than that, hah hah, it looks pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-29 17:27:15 Re: No Issue Tracker - Say it Ain't So!
Previous Message Corey Huinker 2015-09-29 17:18:11 Re: No Issue Tracker - Say it Ain't So!