Re: [DESIGN] ParallelAppend

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-08-07 08:45:21
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8011300E4@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> >
> > > On Tue, Jul 28, 2015 at 6:08 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > >
> > > I am not sure, but what problem do you see in putting Funnel node
> > > for one of the relation scans and not for the others.
> > >
> > At this moment, I'm not certain whether background worker can/ought
> > to launch another background workers.
> > If sub-Funnel node is executed by 10-processes then it also launch
> > 10-processes for each, 100-processes will run for each?
> >
>
> Yes, that could be more work than current, but what I had in mind
> is not that way, rather I was thinking that master backend will only
> kick of workers for Funnel nodes in plan.
>
I agree with, it is fair enough approach, so I mention about
pull-up of Funnel node.

> > > > If we pull Funnel here, I think the plan shall be as follows:
> > > > Funnel
> > > > --> SeqScan on rel1
> > > > --> PartialSeqScan on rel2
> > > > --> IndexScan on rel3
> > > >
> > >
> > > So if we go this route, then Funnel should have capability
> > > to execute non-parallel part of plan as well, like in this
> > > case it should be able to execute non-parallel IndexScan on
> > > rel3 as well and then it might need to distinguish between
> > > parallel and non-parallel part of plans. I think this could
> > > make Funnel node complex.
> > >
> > It is difference from what I plan now. In the above example,
> > Funnel node has two non-parallel aware node (rel1 and rel3)
> > and one parallel aware node, then three PlannedStmt for each
> > shall be put on the TOC segment. Even though the background
> > workers pick up a PlannedStmt from the three, only one worker
> > can pick up the PlannedStmt for rel1 and rel3, however, rel2
> > can be executed by multiple workers simultaneously.
>
> Okay, now I got your point, but I think the cost of execution
> of non-parallel node by additional worker is not small considering
> the communication cost and setting up an addional worker for
> each sub-plan (assume the case where out of 100-child nodes
> only few (2 or 3) nodes actually need multiple workers).
>
It is a competition between traditional Append that takes Funnel
children and the new appendable Funnel that takes parallel and
non-parallel children. Probably, key factors are cpu_tuple_comm_cost,
parallel_setup_cost and degree of selectivity of sub-plans.
Both cases has advantage and disadvantage depending on the query,
so we can never determine which is better without path consideration.

> > > I think for a particular PlannedStmt, number of workers must have
> > > been decided before start of execution, so if those many workers are
> > > available to work on that particular PlannedStmt, then next/new
> > > worker should work on next PlannedStmt.
> > >
> > My concern about pre-determined number of workers is, it depends on the
> > run-time circumstances of concurrent sessions. Even if planner wants to
> > assign 10-workers on a particular sub-plan, only 4-workers may be
> > available on the run-time because of consumption by side sessions.
> > So, I expect only maximum number of workers is meaningful configuration.
> >
>
> In that case, there is possibility that many of the workers are just
> working on one or two of the nodes and other nodes execution might
> get starved. I understand this is tricky problem to allocate number
> of workers for different nodes, however we should try to develop any
> algorithm where there is some degree of fairness in allocation of workers
> for different nodes.
>
I like to agree, however, I also want to keep the first version as
simple as possible we can. We can develop alternative logic to assign
suitable number of workers later.

> > > 2. Execution of work by workers and Funnel node and then pass
> > > the results back to upper node. I think this needs some more
> > > work in addition to ParallelSeqScan patch.
> > >
> > I expect we can utilize existing infrastructure here. It just picks
> > up the records come from the underlying workers, then raise it to
> > the upper node.
> >
>
>
> Sure, but still you need some work atleast in the area of making
> workers understand different node types, I am guessing you need
> to modify readfuncs.c to support new plan node if any for this
> work.
>
Yes, it was not a creative work. :-)
https://github.com/kaigai/sepgsql/blob/fappend/src/backend/nodes/readfuncs.c#L1479

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-08-07 13:18:06 Re: Raising our compiler requirements for 9.6
Previous Message Andreas Seltenreich 2015-08-07 07:47:28 [sqlsmith] ERROR: too late to create a new PlaceHolderInfo