Re: WIP: Upper planner pathification

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Upper planner pathification
Date: 2016-03-09 12:30:22
Message-ID: CAA4eK1KPmF0hU5yz7yP7ss6uXp99Ba5mJJ5+948xXmV=wt6ubw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 8, 2016 at 2:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> >> Attached is a version that addresses today's concerns, and also
finishes
> >> filling in the loose ends I'd left before, such as documentation and
> >> outfuncs.c support. I think this is in a committable state now, though
> >> I plan to read through the whole thing again.
>
> The extra read-through located some minor bugs, mainly places where I'd
> forgotten to ensure that Path cost info was transposed into the generated
> Plan. That would only have the cosmetic effect that EXPLAIN would print
> zeroes for estimated costs, and since we only use EXPLAIN COSTS OFF in
> the regression tests, no test failures ensued :-(.
>
> I've pushed it now; we'll soon see if the buildfarm finds any problems.
>

On latest commit-51c0f63e, I am seeing some issues w.r.t parallel query.
Consider a below case:

create table t1(c1 int, c2 char(1000));
insert into t1 values(generate_series(1,300000),'aaaa');
analyze t1;
set max_parallel_degree=2;
postgres=# explain select c1, count(c1) from t1 where c1 < 1000 group by c1;
ERROR: ORDER/GROUP BY expression not found in targetlist

Without setting max_parallel_degree, it works fine and generate the
appropriate results. Here the issue seems to be that the code in
grouping_planner doesn't apply the required PathTarget to Path below Gather
Path due to which when we generate target list for scan plan, it misses the
required information which in this case is sortgrouprefs and the same
target list is then propagated for upper nodes which eventually leads to
the above mentioned failure. Due to same reason, if the target list
contains some expression, it will give wrong results when parallel query is
used. I could see below ways to solve this issue.

Approach-1
-----------------
First way to solve this issue is to jam the PathTarget for partial paths
similar to what we are doing for Paths and I have verified that resolves
the issue, the patch for same is attached with this mail. However, it
won't work as-is, because this will make target lists pushed to workers as
we have applied them below Gather Path which we don't want if the target
list has any parallel unsafe functions. To make this approach work, we
need to ensure that we jam the PathTarget for partial paths (or generate a
projection) only if they contain parallel-safe expressions. Now while
creating Gather Plan (create_gather_plan()), we need to ensure in some way
that if path below doesn't generate the required tlist, then it should
generate it's own tlist rather than using it from subplan.

Approach-2
------------------
Always generate a tlist in Gather plan as we do for many other cases. I
think this approach will also resolve the issue but haven't tried yet.

Approach-1 has a benefit that it can allow to push target lists to workers
which will result in speed-up of parallel queries especially for cases when
target list contain costly expressions, so I am slightly inclined to follow
that even though that looks more work.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
apply_tlist_partial_path_v1.patch application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-03-09 12:33:54 Re: More stable query plans via more predictable column statistics
Previous Message Konstantin Knizhnik 2016-03-09 12:29:57 Re: Optimizer questions