From: | Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? |
Date: | 2004-06-15 15:30:40 |
Message-ID: | 200406151730.40464.ftm.van.vugt@foxi.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> Obviously this is on toy tables
The query is simplified, yes. But the data in the tables is real, albeit
they're not that large.
> You're misinterpreting it.
I might very well be ;)
But I also get the feeling I didn't explain to you well enough what I meant...
> Without the group by, the plan is a candidate for
nestloop-with-inner-index-scan
Yes, I understand that. I only ditched the group by to check whether the
contraint on the article table was indeed recognized as a constraint on the
package table based on 'article.id = foo.article_id'. And it is/was.
> with the group by, there's another step in the way.
Yep, but on my system, package gets seq-scanned *without* any additional
constraint, resulting in a loooooong processing time.
> Pushing down into subselects does get done, for instance in CVS tip
> I can change the last part of your query to "foo.article_id < 50"
> and get ...
This is why I think I wasn't clear enough.
In the real thing, the constraint on the article table is built by some
external source and I cannot easily make assumptions to translate these to a
constraint on the package table, especially since I expect the planner to be
far better in that than me ;)
So, my base query is this:
select
article.id, p_min
from
article,
(select
article_id, min(amount) as p_min
from
package
group by
article_id
) as foo
where
article.id = foo.article_id and
<some constraint on article table>;
Now, when <constraint> = true, this obviously results in seqscans:
Hash Join (cost=1106.79..1251.46 rows=4452 width=8)
Hash Cond: ("outer".article_id = "inner".id)
-> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8)
-> HashAggregate (cost=726.10..737.23 rows=4451 width=8)
-> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8)
-> Hash (cost=369.35..369.35 rows=4535 width=4)
-> Seq Scan on article (cost=0.00..369.35 rows=4535 width=4)
But when <constraint> = 'article.id < 50', only article is indexscanned:
Hash Join (cost=730.11..808.02 rows=1 width=8)
Hash Cond: ("outer".article_id = "inner".id)
-> Subquery Scan foo (cost=726.10..781.74 rows=4451 width=8)
-> HashAggregate (cost=726.10..737.23 rows=4451 width=8)
-> Seq Scan on package (cost=0.00..635.40 rows=18140 width=8)
-> Hash (cost=4.01..4.01 rows=1 width=4)
-> Index Scan using article_pkey on article (cost=0.00..4.01 rows=1
width=4)
Index Cond: (id < 50)
Which still results in poor performance due to the seqscan on package.
Putting the constraint on package is boosting performance indeed, but I cannot
make that assumption.
So, what I was asking was:
When the 'article.id < 50' constraint is added, it follows that
'foo.article_id < 50' is a constraint as well. Why is this constraint not
used to avoid the seqscan on package?
> Obviously this is on toy tables, but the point is that the constraint
> does get pushed down through the GROUP BY when appropriate.
I've seen it being pushed down when it already was defined as a constraint on
the group by, like in your example.
If necessary, I'll throw together a few commands that build some example
tables to show what I mean.
--
Best,
Frank.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2004-06-15 18:04:41 | Re: why is a constraint not 'pushed down' into a subselect when this subselect is using a 'group by' ?? |
Previous Message | Aaron | 2004-06-15 15:16:06 | Re: 50 000 000 Table entries and I have to do a keyword search HELP NEEDED |