Re: Partial Aggregation / GROUP BY before JOIN

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partial Aggregation / GROUP BY before JOIN
Date: 2015-09-28 07:36:31
Message-ID: 5608EDFF.4040101@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015/09/28 13:31, David Rowley wrote:
> I've been spending time working on allowing the planner to perform
> aggregation before the final join relation is created.
>
...

>
> The patch is however so far capable of giving us extremely nice performance
> improvements for some (likely artificial) queries.
>
> Let's look at a quick example:
>
> CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
> NULL, PRIMARY KEY(product_id));
> CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
> -- create small list of products
> INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
> generate_series(1,100) g(id);
>
> CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
> INT NOT NULL);
>
> INSERT INTO sale (sale_id, product_id,quantity) SELECT
> x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
> generate_series(1,100000000) x(x);
>
> ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);
>
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -------------
> 10000000000
> (1 row)
> Time: 10323.053 ms
>
>
> And if I disable the optimisation:
>
> test=# set enable_earlygrouping = off;
> SET
> Time: 0.302 ms
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -------------
> 10000000000
> (1 row)
> Time: 775790.764 ms
>
> So, in this probably rather unlikely query, we get something around a 7500%
> performance increase. Of course as the ratio of groups per underlying
> tuples increase, the performance increase will tail off.
>
> The explain output from the optimised version is as follows:
>
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Finalize Aggregate (cost=1790544.37..1790544.38 rows=1 width=4)
> -> Nested Loop (cost=1790541.10..1790544.12 rows=100 width=4)
> -> Partial Aggregate (cost=1790541.10..1790541.11 rows=1 width=4)
> -> Seq Scan on sale (cost=0.00..1540541.08 rows=100000008
> width=4)
> -> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)
>
>

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN

--------------------------------------------------------------------------------
Aggregate (cost=17909.27..17909.28 rows=1 width=4)
-> Nested Loop (cost=17906.00..17909.02 rows=100 width=4)
-> Aggregate (cost=17906.00..17906.01 rows=1 width=4)
-> Seq Scan on sale (cost=0.00..15406.00 rows=1000000
width=4)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
count
-------
100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=1515408.25..1515408.26 rows=1 width=4)
-> Nested Loop (cost=0.00..1265408.25 rows=100000000 width=4)
-> Seq Scan on sale (cost=0.00..15406.00 rows=1000000 width=4)
-> Materialize (cost=0.00..2.50 rows=100 width=0)
-> Seq Scan on product (cost=0.00..2.00 rows=100 width=0)
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
count
-----------
100000000
(1 row)

Am I missing something?

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2015-09-28 07:55:26 Re: Doubt in pgbench TPS number
Previous Message Kouhei Kaigai 2015-09-28 07:34:23 Re: Foreign join pushdown vs EvalPlanQual