Re: 7.0.2 and 6.5.3 performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: lec <englim(at)pc(dot)jaring(dot)my>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: 7.0.2 and 6.5.3 performance
Date: 2000-09-18 02:26:50
Message-ID: 9634.969244010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

lec <englim(at)pc(dot)jaring(dot)my> writes:
> explain select itm_section, itm_brand, itm_itemno, sum(stl_qty)
> from stl, item where stl_itemno = itm_itemno
> and stl_discounttype in ('','S','V')
> and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
> and stl_status='N'
> and itm_sectiontype='O'
> and itm_section >='101'
> and itm_section <='135'
> group by itm_section, itm_brand, itm_itemno;

> The following is the query/explain under 7.0.2:
> -> Nested Loop (cost=0.00..27770.88 rows=1 width=72)
> -> Seq Scan on item (cost=0.00..2549.45 rows=1 width=36)
> -> Index Scan using stl_datetrx on stl (cost=0.00..25157.34 rows=5127 width=36)
>
> The following is the query/explain under 6.5.3:
> -> Hash Join (cost=25780.13 rows=55386 width=72)
> -> Index Scan using stl_datetrx on stl (cost=15488.75 rows=210658 width=36)
> -> Hash (cost=3225.66 rows=2879 width=36)
> -> Seq Scan on item (cost=3225.66 rows=2879 width=36)

What's driving the difference in plan choice is evidently the much
smaller number of rows that 7.0 estimates will match the query
restrictions. Specifically, it's figuring that only one item row
will match
and itm_sectiontype='O'
and itm_section >='101'
and itm_section <='135'
whereas 6.5 estimates 2879. With only one row to match, it figures it
might as well just do the join with a nested loop and not bother to set
up a hashtable.

Since you're complaining, I suppose this statistical estimate is way
off :-(. What do you actually get from

select count(*) from stl where
stl_discounttype in ('','S','V')
and stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'
and stl_status='N';

select count(*) from item where
itm_sectiontype='O'
and itm_section >='101'
and itm_section <='135';
?

I suspect the reason for the difference in estimated row counts is
that 7.0 recognizes that these are range queries; instead of treating
"stl_trxdate>='Jul 01, 2000' and stl_trxdate<='Jul 02, 2000'" as two
independent clauses, or "itm_section >='101' and itm_section <='135'"
as independent clauses, it combines those clauses and tries to estimate
the fraction of the data range being requested. This is usually a win
but it's possible to lose rather badly if the requested range covers
a spike in the data's frequency distribution. We don't yet have
adequate statistics to know that a spike may be involved...

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message luc00 2000-09-18 04:04:41 OFF topic : SQL interface to non-SQL as DBF ?
Previous Message Robert D Oden, IS Director 2000-09-18 02:25:37 Re: Regression test failures