From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: why doesn't optimizer can pull up where a > ( ... ) |
Date: | 2019-11-20 12:16:33 |
Message-ID: | CAKU4AWrLa4y-t5O10x4w1aDog7TRm5O7YnGZ8FmU7BS9qFTxWQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Nov 20, 2019 at 8:15 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> Hi Hackers:
>
> First I found the following queries running bad on pg.
>
> select count(*) from part2 p1 where p_size > 40 and p_retailprice >
> (select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand);
>
> the plan is
> QUERY PLAN
>
> ------------------------------------------------------------------------------------
> Aggregate (cost=1899310537.28..1899310537.29 rows=1 width=8)
> -> Seq Scan on part2 p1 (cost=0.00..1899310456.00 rows=32513 width=0)
> Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1)))
> SubPlan 1
> -> Aggregate (cost=6331.00..6331.01 rows=1 width=32)
> -> Seq Scan on part2 p2 (cost=0.00..5956.00 rows=150000
> width=4)
> Filter: (p_brand = p1.p_brand)
>
> however if we change it to the following format, it runs pretty quick.
>
> select count(*) from part2,
> (select p_brand, avg(p_retailprice) as avg_price from part2 where p_size >
> 40 group by p_brand) p2
> where p_retailprice > p2.avg_price
> and p_size > 40
> and part2.p_brand = p2.p_brand;
>
> The above example comes from
> https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with
> a litter modification.
>
> 1. why pg can't translate the query 1 to query 2. after some checking
> on pull_up_sublinks_qual_recurse, I still doesn't get the idea.
> 2. why pg can't do it, while greenplum can?
>
> Thanks
>
>
add the sql I used for testing for reference.
CREATE TABLE part2 (
p_partkey integer NOT NULL,
p_brand character(10) NOT NULL,
p_size integer NOT NULL,
p_retailprice numeric(15,2) NOT NULL
);
insert into part2 select 1, 'brand1', random_between(0, 40),
random_between(0, 40) from generate_series(1, 100000);
insert into part2 select 2, 'brand2', random_between(40, 80),
random_between(0, 40) from generate_series(1, 100000);
insert into part2 select 3, 'brand1', random_between(0, 40),
random_between(0, 40) from generate_series(1, 100000);
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2019-11-20 13:19:32 | Re: why doesn't optimizer can pull up where a > ( ... ) |
Previous Message | Andy Fan | 2019-11-20 12:15:19 | why doesn't optimizer can pull up where a > ( ... ) |