Re: Pull up aggregate subquery

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pull up aggregate subquery
Date: 2011-06-04 13:53:20
Message-ID: BANLkTimz4jVkMmVkS-uTjkx82M+bBofX4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2011 at 3:47 AM, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> wrote:

> That's true. But if the planning cost is an only issue, why not adding
> new GUC for user to choose if they prefer it or not? Of course if we
> have some method to predict which way to go before proving both ways,
> it's great. Do you have some blue picture on it?

I like your simple patch and looks like it fixes your concern.

Your problem statement ignores the fact that most people would not
write the "original query" like this

select m_id, sum_len from size_m m inner join(select m_id,
sum(length(val)) as sum_len from size_l group by m_id)l on m.id =
l.m_id where val = '1';

they would write it like this

select m_id, sum(length(val))
from size_m m join size_l l on m.id = l.m_id
where val = '1'
group by m_id;

Which gives a far worse plan and one that is not solved by your patch.
Your way of writing the SQL is one of the "hand optimized" ways that
an SQL expert would try to re-write the SQL. We shouldn't be
optimizing only for hand-altered code, since it can always be further
tweaked by hand. We should be optimizing the original, simple queries
(as well as other forms of expressing the same thing).

This highlights that we do not have the infrastructure to push
aggregates up or down, and that the lack of a known "primary key" for
the output of each plan node prevents us from developing a general
transformation infrastructure to solve the general case. That
particular piece of infrastructure is also an essential step towards
materialized views, which would be pretty useless without the
capability to transform aggregates up and down the join tree.

In terms of costing, I think it would be likely that we can apply
simple heuristics. We already assume that applying quals down to the
lowest level possible make sense. I would guess that anything that
reduces the number of rows should be pushed down as far as possible.
I'm sure there are cases where that isn't true, but lets not stop from
solving simple general cases because of the theoretical existence of
complex cases

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-06-04 13:59:40 Re: reducing the overhead of frequent table locks - now, with WIP patch
Previous Message Simon Riggs 2011-06-04 13:11:52 Re: creating CHECK constraints as NOT VALID