Re: why doesn't optimizer can pull up where a > ( ... )

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why doesn't optimizer can pull up where a > ( ... )
Date: 2019-11-20 17:20:01
Message-ID: 20191120172001.gri67zdnwycsobxy@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 20, 2019 at 08:15:19PM +0800, Andy Fan wrote:
>Hi Hackers:
>
> First I found the following queries running bad on pg.
>
> select count(*) from part2 p1 where p_size > 40 and p_retailprice >
>(select avg(p_retailprice) from part2 p2 where p2.p_brand=p1.p_brand);
>
>the plan is
> QUERY PLAN
>------------------------------------------------------------------------------------
> Aggregate (cost=1899310537.28..1899310537.29 rows=1 width=8)
> -> Seq Scan on part2 p1 (cost=0.00..1899310456.00 rows=32513 width=0)
> Filter: ((p_size > 40) AND (p_retailprice > (SubPlan 1)))
> SubPlan 1
> -> Aggregate (cost=6331.00..6331.01 rows=1 width=32)
> -> Seq Scan on part2 p2 (cost=0.00..5956.00 rows=150000
>width=4)
> Filter: (p_brand = p1.p_brand)
>
>however if we change it to the following format, it runs pretty quick.
>
>select count(*) from part2,
>(select p_brand, avg(p_retailprice) as avg_price from part2 where p_size >
>40 group by p_brand) p2
>where p_retailprice > p2.avg_price
>and p_size > 40
>and part2.p_brand = p2.p_brand;
>
>The above example comes from
>https://community.pivotal.io/s/article/Pivotal-Query-Optimizer-Explained with
>a litter modification.
>
>1. why pg can't translate the query 1 to query 2. after some checking
>on pull_up_sublinks_qual_recurse, I still doesn't get the idea.
>2. why pg can't do it, while greenplum can?
>

I don't know the exact place(s) in the optimizer that would need to
consider this optimization, but the primary difference between the two
queries is that the first one is correlated subquery, while the second
one is not.

So I guess our optimizer is not smart enough to recognize this pattern,
and can't do the transformation from

... FROM p WHERE x > (SELECT avg(x) FROM q WHERE p.id = q.id) ...

to

... FROM p, (SELECT q.id, avg(x) x FROM q) q2 WHERE q2.id = p.id
AND q2.x < p.x

I.e. we don't have the code to consider this optimization, because no
one considered it interesting enough to submit a patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-11-20 17:25:37 Re: why doesn't optimizer can pull up where a > ( ... )
Previous Message Julien Rouhaud 2019-11-20 16:54:51 Re: Planning counters in pg_stat_statements (using pgss_store)