Re: Functional dependency in GROUP BY through JOINs

From: "David Rowley" <dgrowleyml(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Simon Riggs'" <simon(at)2ndQuadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Functional dependency in GROUP BY through JOINs
Date: 2012-12-07 03:38:03
Message-ID: 007301cdd42c$462a4fc0$d27eef40$@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 07 December 2012 06:22
> To: Simon Riggs
> Cc: David Rowley; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Functional dependency in GROUP BY through JOINs
>
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 5 December 2012 23:37, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> >> Though this plan might not be quite as optimal as it could be as it
> >> performs the grouping after the join.
>
> > PostgreSQL always calculates aggregation as the last step.
>
> > It's a well known optimisation to push-down GROUP BY clauses to the
> > lowest level, but we don't do that, yet.
>
> > You're right that it can make a massive difference to many queries.
>
> In the case being presented here, it's not apparent to me that there's any
> advantage to be had at all. You still need to aggregate over the rows
joining
> to each uniquely-keyed row. So how exactly are you going to "push down
> the GROUP BY", and where does the savings come from?
>

I guess the saving is, in at least this case it's because the join only
joins 10 rows on either side, but I think also because the grouping would
also be cheaper because it's done on an INT rather than CHAR().
But I'm thinking you're meaning the planner would have to know this is
cheaper and compare both versions of the plan.

I should have showed the plan of the other nested query originally, so here
it is this time.

Version = 9.2.1

test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,SUM(s.quantity)
test-# FROM products p
test-# INNER JOIN bigsalestable s ON p.productid = s.productid
test-# GROUP BY p.product_code;
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------
HashAggregate (cost=18926.22..18926.32 rows=10 width=150) (actual
time=553.403..553.405 rows=10 loops=1)
-> Hash Join (cost=1.23..18676.22 rows=50000 width=150) (actual
time=0.041..324.970 rows=1000000 loops=1)
Hash Cond: (s.productid = p.productid)
-> Seq Scan on bigsalestable s (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.012..70.627 rows=1000000 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=150) (actual
time=0.013..0.013 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on products p (cost=0.00..1.10 rows=10
width=150) (actual time=0.004..0.007 rows=10 loops=1)
Total runtime: 553.483 ms
(8 rows)

test=# EXPLAIN ANALYZE
test-# SELECT p.product_code,s.quantity
test-# FROM products AS p
test-# INNER JOIN (SELECT productid,SUM(quantity) AS quantity FROM
bigsalestable GROUP BY productid) AS s ON p.productid = s.productid;
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------------------------
Hash Join (cost=19426.22..19431.07 rows=10 width=154) (actual
time=295.548..295.557 rows=10 loops=1)
Hash Cond: (bigsalestable.productid = p.productid)
-> HashAggregate (cost=19425.00..19427.00 rows=200 width=8) (actual
time=295.514..295.518 rows=10 loops=1)
-> Seq Scan on bigsalestable (cost=0.00..14425.00 rows=1000000
width=8) (actual time=0.004..59.330 rows=1000000 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=150) (actual time=0.017..0.017
rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on products p (cost=0.00..1.10 rows=10 width=150)
(actual time=0.010..0.012 rows=10 loops=1)
Total runtime: 295.612 ms
(8 rows)

Regards

David Rowley

> regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2012-12-07 03:41:16 Re: Functional dependency in GROUP BY through JOINs
Previous Message Stephen Frost 2012-12-07 03:35:40 Re: pg_upgrade problem with invalid indexes