Re: WIP: Upper planner pathification

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Upper planner pathification
Date: 2016-03-01 15:02:07
Message-ID: 10131.1456844527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Tue, Mar 1, 2016 at 2:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are a couple of
>> regression test cases that change plans for the better, but it's sort of
>> accidental. Those cases look like
>>
>> select d.* from d left join (select * from b group by b.id, b.c_id) s
>> on d.a = s.id;
>>
>> and what happens in HEAD is that the subquery chooses a hashagg plan
>> and then the upper query decides a mergejoin would be a good idea ...
>> so it has to sort the output of the hashagg. With the patch, what
>> comes back from the subquery is a Path for the hashagg and a Path
>> for doing the GROUP BY with Sort/Uniq. The second path is more expensive,
>> but it survives the add_path tournament because it can produce sorted
>> output. Then the outer level discovers that it can use that to do its
>> mergejoin without a separate sort step, and that way is cheaper overall.

> This doesn't sound accidental at all. It sounds like a perfect example
> of exactly the benefits of this approach.

Well, my point is that no such path would have been generated if the
subquery hadn't had an internal reason to consider sorting on b.id.
The "accidental" part of this is that the subquery's GROUP BY key
matches what the outer query needs as a mergejoin key.

> (Actually the first hunk in the patch kind of surprised me. Do we dump
> node trees with -> notation currently? I thought they normally all
> looked like sexpressions.)

I chose in 19a541143 to not make PathTarget be a subclass of Node,
so that's kind of forced --- we can't print it by recursing to
_outNode(). We could change that but I'm not sure it would be an
improvement. The restarget fields are embedded in RelOptInfo, not
sub-nodes of it, so pretending that they're independent nodes seems
a bit phony in its own way. I'm not wedded to that reasoning though;
if people are more concerned about what pprint() output looks like,
we can change it. Or we could make restarget actually be a subnode,
at the cost of one more palloc per RelOptInfo.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-01 15:06:37 Re: PoC: Partial sort
Previous Message Pavel Stehule 2016-03-01 14:52:26 Re: Sort returns more rows than seq scan?