why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??

From: Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>
To: pgsql-performance(at)postgresql(dot)org
Subject: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ??
Date: 2004-06-15 06:49:29
Message-ID: 200406150849.29470.ftm.van.vugt@foxi.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I noticed the following. Given two tables, just simply articles and their
packages:

article(id int)
package( id int, article_id int, amount)

When taking the minimum package for articles given some constraint on the
article table

select
article.id,
p_min
from
article,
(select
article_id,
min(amount) as p_min
from
package
group by
article_id
) as foo
where
article.id = foo.article_id and
article.id < 50;

The query plan shows

QUERY PLAN
----------------------------------------------------------------------------------------
Hash Join (cost=730.11..808.02 rows=1 width=8)
Hash Cond: ("outer".article_id = "inner".id)
-> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8)
-> HashAggregate (cost=726.10..737.23 rows=4451 width=8)
-> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8)
-> Hash (cost=4.01..4.01 rows=1 width=4)
-> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1
width=4)
Index Cond: (id < 50)

So, a seq scan on the complete package table is done, although it seems the
planner could have deduced that only the part that has 'article_id < 50'
would be relevant, since 'article.id < 50' and 'article.id = foo.article_id'

If I ditch the group by, then this contraint does get pushed into the
subselect :

select
article.id,
p_min
from
article,
(select
article_id,
amount as p_min
from
package
) as foo
where
article.id = foo.article_id and
article.id < 50;

QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.00..14.22 rows=5 width=8)
-> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1
width=4)
Index Cond: (id < 50)
-> Index Scan using package_idx3 on package (cost=0.00..10.16 rows=4
width=8)
Index Cond: ("outer".id = package.article_id)

So it seems the math-knowledge is there ;)
I'm wondering about what I'm overlooking here. When I take the first query and
add the article constraint 'manually', I get:

select
article.id,
p_min
from
article,
(select
article_id,
min(amount) as p_min
from
package
where
article_id < 50
group by
article_id
) as foo
where
article.id = foo.article_id and
article.id < 50;

QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..9.65 rows=1 width=8)
Join Filter: ("outer".id = "inner".article_id)
-> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1
width=4)
Index Cond: (id < 50)
-> Subquery Scan foo (cost=0.00..5.62 rows=1 width=8)
-> GroupAggregate (cost=0.00..5.61 rows=1 width=8)
-> Index Scan using package_idx3 on package (cost=0.00..5.60
rows=2 width=8)
Index Cond: (article_id < 50)

which would have been a nice plan for the first query ;)

Could someone shine a light on this, please?

--
Best,

Frank.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeremy Dunn 2004-06-15 12:43:49 Re: 50 000 000 Table entries and I have to do a keyword search HELP NEEDED
Previous Message Neeraj 2004-06-15 06:03:40 Interpreting OSDB Results