Re: Parallel Seq Scan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, "Andres Freund" <andres(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, "Amit Langote" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-10-01 06:35:01
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80114D55B@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Robert,

Gather node was oversight by readfunc.c, even though it shall not be
serialized actually.
Also, it used incompatible WRITE_xxx_FIELD() macro on outfuncs.c.

The attached patch fixes both of incomsistence.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Robert Haas
> Sent: Wednesday, September 30, 2015 2:19 AM
> To: Amit Kapila
> Cc: Kaigai Kouhei(海外 浩平); Haribabu Kommi; Gavin Flower; Jeff Davis; Andres
> Freund; Amit Langote; Amit Langote; Fabrízio Mello; Thom Brown; Stephen Frost;
> pgsql-hackers
> Subject: Re: [HACKERS] Parallel Seq Scan
>
> On Tue, Sep 29, 2015 at 12:39 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Attached patch is a rebased patch based on latest commit (d1b7c1ff)
> > for Gather node.
> >
> > - I have to reorganize the defines in execParallel.h and .c. To keep
> > ParallelExecutorInfo, in GatherState node, we need to include execParallel.h
> > in execnodes.h which was creating compilation issues as execParallel.h
> > also includes execnodes.h, so for now I have defined ParallelExecutorInfo
> > in execnodes.h and instrumentation related structures in instrument.h.
> > - Renamed parallel_seqscan_degree to degree_of_parallelism
> > - Rename Funnel to Gather
> > - Removed PARAM_EXEC parameter handling code, I think we can do this
> > separately.
> >
> > I have to work more on partial seq scan patch for rebasing it and handling
> > review comments for the same, so for now I am sending the first part of
> > patch (which included Gather node functionality and some general support
> > for parallel-query execution).
>
> Thanks for the fast rebase.
>
> This patch needs a bunch of cleanup:
>
> - The formatting for the GatherState node's comment block is unlike
> that of surrounding comment blocks. It lacks the ------- dividers,
> and the indentation is not the same. Also, it refers to
> ParallelExecutorInfo by the type name, but the other members by
> structure member name. The convention is to refer to them by
> structure member name, so please do that.
>
> - The naming of fs_workersReady is inconsistent with the other
> structure members. The other members use all lower-case names,
> separating words with dashes, but this one uses a capital letter. The
> other members also don't prefix the names with anything, but this uses
> a "fs_" prefix which I assume is left over from when this was called
> FunnelState. Finally, this doesn't actually tell you when workers are
> ready, just whether they were launched. I suggest we rename this to
> "any_worker_launched".
>
> - Instead of moving the declaration of ParallelExecutorInfo, please
> just refer to it as "struct ParallelExecutorInfo" in execnodes.h.
> That way, you're not sucking these includes into all kinds of places
> they don't really need to be.
>
> - Let's not create a new PARALLEL_QUERY category of GUC. Instead,
> let's the GUC for the number of workers with under resource usage ->
> asynchronous behavior.
>
> - I don't believe that shm_toc *toc has any business being part of a
> generic PlanState node. At most, it should be part of an individual
> type of PlanState, like a GatherState or PartialSeqScanState. But
> really, I don't see why we need it there at all. It should, I think,
> only be needed during startup to dig out the information we need. So
> we should just dig that stuff out and keep pointers to whatever we
> actually need - in this case the ParallelExecutorInfo, I think - in
> the particular type of PlanState node that's at issue - here
> GatherState. After that we don't need a pointer to the toc any more.
>
> - I'd like to do some renaming of the new GUCs. I suggest we rename
> cpu_tuple_comm_cost to parallel_tuple_cost and degree_of_parallelism
> to max_parallel_degree.
>
> - I think that a Gather node should inherit from Plan, not Scan. A
> Gather node really shouldn't have a scanrelid. Now, admittedly, if
> the only thing under the Gather is a Partial Seq Scan, it wouldn't be
> totally bonkers to think of the Gather as scanning the same relation
> that the Partial Seq Scan is scanning. But in any more complex case,
> like where it's scanning a join, you're out of luck. You'll have to
> set scanrelid == 0, I suppose, but then, for example, ExecScanReScan
> is not going to work. In fact, as far as I can see, the only way
> nodeGather.c is actually using any of the generic scan stuff is by
> calling ExecInitScanTupleSlot, which is all of one line of code.
> ExecEndGather fetches node->ss.ss_currentRelation but then does
> nothing with it. So I think this is just a holdover from early
> version of this patch where what's now Gather and PartialSeqScan were
> a single node, and I think we should rip it out.
>
> - On a related note, the assertions in cost_gather() are both bogus
> and should be removed. Similarly with create_gather_plan(). As
> previously mentioned, the Gather node should not care what sort of
> thing is under it; I am not interested in restricting it to baserels
> and then undoing that later.
>
> - For the same reason, cost_gather() should refer to it's third
> argument as "rel" not "baserel".
>
> - Also, I think this stuff about physical tlists in
> create_gather_plan() is bogus. use_physical_tlist is ignorant of the
> possibility that the RelOptInfo passed to it might be anything other
> than a baserel, and I think it won't be happy if it gets a joinrel.
> Moreover, I think our plan here is that, at least for now, the
> Gather's tlist will always match the tlist of its child. If that's
> so, there's no point to this: it will end up with the same tlist
> either way. If any projection is needed, it should be done by the
> Gather node's child, not the Gather node itself.
>
> - Let's rename DestroyParallelSetupAndAccumStats to
> ExecShutdownGather. Instead of encasing the entire function in if
> statement, let's start with if (node->pei == NULL || node->pei->pcxt
> == NULL) return.
>
> - ExecParallelBufferUsageAccum should be declared to take an argument
> of type PlanState, not Node. Then you don't have to cast what you are
> passing to it, and it doesn't have to cast before calling itself. And,
> let's also rename it to ExecShutdownNode and move it to
> execProcnode.c. Having a "shutdown phase" that stops a node from
> asynchronously consuming additional resources could be useful for
> non-parallel node types - especially ForeignScan and CustomScan. And
> we could eventually extend this to be called in other situations, like
> when a Limit is filled give everything beneath it a chance to ease up.
> We don't have to do those bits of work right now but it seems well
> worth making this look like a generic facility.
>
> - Calling DestroyParallelSetupAndAccumStats from ExplainNode when we
> actually reach the Gather node is much too late. We should really be
> shutting down parallel workers at the end of the ExecutorRun phase, or
> certainly no later than ExecutorFinish. In fact, you have
> standard_ExecutorRun calling ExecParallelBufferUsageAccum() but only
> if queryDesc->totaltime is set. What I think you should do instead is
> call ExecShutdownNode a few lines earlier, before shutting down the
> tuple receiver, and do so unconditionally. That way, the workers are
> always shut down in the ExecutorRun phase, which should eliminate the
> need for this bit in explain.c.
>
> - The changes to postmaster.c and postgres.c consist of only
> additional #includes. Those can, presumably, be reverted.
>
> Other than that, hah hah, it looks pretty cool.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Attachment Content-Type Size
pgsql-gather-on-readfunc.v1.patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2015-10-01 06:38:07 Re: Foreign join pushdown vs EvalPlanQual
Previous Message Josh Berkus 2015-10-01 04:02:19 Re: Idea for improving buildfarm robustness