Re: [DESIGN] ParallelAppend

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-10-30 16:35:26
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801160812@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Wed, Oct 28, 2015 at 3:55 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > At PGconf.EU, I could have a talk with Robert about this topic,
> > then it became clear we have same idea.
> >
> >> +--------+
> >> |sub-plan | * Sub-Plan 1 ... Index Scan on p1
> >> |index on *-----> * Sub-Plan 2 ... PartialSeqScan on p2
> >> |shared | * Sub-Plan 2 ... PartialSeqScan on p2
> >> |memory | * Sub-Plan 2 ... PartialSeqScan on p2
> >> +---------+ * Sub-Plan 3 ... Index Scan on p3
> >>
> > In the above example, I put non-parallel sub-plan to use only
> > 1 slot of the array, even though a PartialSeqScan takes 3 slots.
> > It is a strict rule; non-parallel aware sub-plan can be picked
> > up once.
> > The index of sub-plan array is initialized to 0, then increased
> > to 5 by each workers when it processes the parallel-aware Append.
> > So, once a worker takes non-parallel sub-plan, other worker can
> > never take the same slot again, thus, no duplicated rows will be
> > produced by non-parallel sub-plan in the parallel aware Append.
> > Also, this array structure will prevent too large number of
> > workers pick up a particular parallel aware sub-plan, because
> > PartialSeqScan occupies 3 slots; that means at most three workers
> > can pick up this sub-plan. If 1st worker took the IndexScan on
> > p1, and 2nd-4th worker took the PartialSeqScan on p2, then the
> > 5th worker (if any) will pick up the IndexScan on p3 even if
> > PartialSeqScan on p2 was not completed.
>
> Actually, this is not exactly what I had in mind. I was thinking that
> we'd have a single array whose length is equal to the number of Append
> subplans, and each element of the array would be a count of the number
> of workers executing that subplan. So there wouldn't be multiple
> entries for the same subplan, as you propose here. To distinguish
> between parallel-aware and non-parallel-aware plans, I plan to put a
> Boolean flag in the plan itself.
>
I don't have strong preference here. Both of design can implement the
requirement; none-parallel sub-plans are never picked up twice, and
parallel-aware sub-plans can be picked up multiple times.
So, I'll start with the above your suggestion.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-10-30 17:01:07 Re: extend pgbench expressions with functions
Previous Message David Fetter 2015-10-30 16:23:31 Re: Patch: Implement failover on libpq connect level.