Re: [DESIGN] ParallelAppend

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-11-18 18:57:34
Message-ID: CA+TgmoYzGJ208n-pzQCx71e139o5XMQ6TJLaotRPZYm-N7=BPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 18, 2015 at 7:25 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Don't we need the startup cost incase we need to build partial paths for
> joinpaths like mergepath?
> Also, I think there are other cases for single relation scan where startup
> cost can matter like when there are psuedoconstants in qualification
> (refer cost_qual_eval_walker()) or let us say if someone has disabled
> seq scan (disable_cost is considered as startup cost.)

I'm not saying that we don't need to compute it. I'm saying we don't
need to take it into consideration when deciding which paths have
merit. Note that consider_statup is set this way:

rel->consider_startup = (root->tuple_fraction > 0);

And for a joinrel:

joinrel->consider_startup = (root->tuple_fraction > 0);

root->tuple_fraction is 0 when we expect all tuples to be retrieved,
and parallel query can currently only be used when all tuples will be
retrieved.

> B. I think partial path is an important concept and desrves some
> explanation in src/backend/optimizer/README.
> There is already a good explanation about Paths, so I think it
> seems that it is better to add explanation about partial paths.

Good idea. In the attached, revised version of the patch, I've added
a large new section to that README.

> 2.
> + * costs: parallelism is only used for plans that will be run to
> completion.
> + * Therefore, this
> routine is much simpler than add_path: it needs to
> + * consider only pathkeys and total cost.
>
> There seems to be some spacing issue in last two lines.

Fixed.

> A.
> This means that for inheritance child relations for which rel pages are
> less than parallel_threshold, it will always consider the cost shared
> between 1 worker and leader as per below calc in cost_seqscan:
> if (path->parallel_degree > 0)
> run_cost = run_cost / (path->parallel_degree + 0.5);
>
> I think this might not be the appropriate cost model for even for
> non-inheritence relations which has pages more than parallel_threshold,
> but it seems to be even worst for inheritance children which have
> pages less than parallel_threshold

Why? I'm certainly open to patches to improve the cost model, but I
don't see why this patch needs to do that.

> B.
> Will it be possible that if none of the inheritence child rels (or very few
> of them) are big enough for parallel scan, then considering Append
> node for parallelism of any use or in other words, won't it be better
> to generate plan as it is done now without this patch for such cases
> considering current execution model of Gather node?

I think we should instead extend the Append node as suggested by
KaiGai, so that it can have both partial and non-partial children. I
think we can leave that to another patch, though. Aside from
questions of what the fastest plan are, it's very bad to have Gather
nodes under the Append, because the Append could have many children
and we could end up with a ton of Gather nodes, each using a DSM and a
bunch of workers. That's important to avoid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
gather-append-pushdown-v2.patch text/x-diff 33.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-11-18 19:03:25 Re: LLVM miscompiles numeric.c access to short numeric var headers
Previous Message Jeff Janes 2015-11-18 18:31:45 Re: Using quicksort for every external sort run