Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
Date: 2011-07-27 14:40:23
Message-ID: 4E302357.90704@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-07-27 16:16, Robert Haas wrote:
> On Tue, Jul 26, 2011 at 5:37 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeb Havinga<yebhavinga(at)gmail(dot)com> writes:
>>> A few days ago I read Tomas Vondra's blog post about dss tpc-h queries
>>> on PostgreSQL at
>>> http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in
>>> which he showed how to manually pull up a dss subquery to get a large
>>> speed up. Initially I thought: cool, this is probably now handled by
>>> Hitoshi's patch, but it turns out the subquery type in the dss query is
>>> different.
>> Actually, I believe this example is the exact opposite of the
>> transformation Hitoshi proposes. Tomas was manually replacing an
>> aggregated subquery by a reference to a grouped table, which can be
>> a win if the subquery would be executed enough times to amortize
>> calculation of the grouped table over all the groups (some of which
>> might never be demanded by the outer query). Hitoshi was talking about
>> avoiding calculations of grouped-table elements that we don't need,
>> which would be a win in different cases. Or at least that was the
>> thrust of his original proposal; I'm not sure where the patch went since
>> then.
>>
>> This leads me to think that we need to represent both cases as the same
>> sort of query and make a cost-based decision as to which way to go.
>> Thinking of it as a pull-up or push-down transformation is the wrong
>> approach because those sorts of transformations are done too early to
>> be able to use cost comparisons.
> I think you're right. OTOH, our estimates of what will pop out of an
> aggregate are so poor that denying the user to control the plan on the
> basis of how they write the query might be a net negative. :-(
>

Tom and Robert, thank you both for your replies. I think I'm having some
blind spots and maybe false assumptions regarding the overal work in the
optimizer, as it is not clear to me what 'the same sort of query' would
look like. I was under the impression that using cost to select the best
paths is only done per simple query, and fail to see how a total
combined plan with pulled up subquery could be compared on cost with a
total plan where the subquery is still a separate subplan, since the
range tables / simple-queries to compare are different.

regards,
Yeb

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2011-07-27 14:43:26 Re: WIP: Fast GiST index build
Previous Message Peter Eisentraut 2011-07-27 14:18:35 Re: XMLATTRIBUTES vs. values of type XML