Skip site navigation (1) Skip section navigation (2)

RE: 7.0 like selectivity

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: RE: 7.0 like selectivity
Date: 2000-04-06 23:38:41
Message-ID: 000501bfa021$3d611240$2801007e@tpf.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackers
> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> 
> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> > For the query
> > select hoge_cd,shimeinn,tel
> >  from t_hoge
> >  where shimeinn like 'imag%'
> >    and tel like '012%'
> >  order by hoge_cd
> >  limit 100;
> 
> > 64 rows returned immediately.
> 
> > And for the query
> > select hoge_cd,shimeinn,tel
> >  from t_hoge
> >  where shimeinn like 'imag%'
> >    and tel like '012-3%'
> >  order by hoge_cd
> >   limit 100;
> 
> > 24 rows returned after waiting 8 minutes.
> 
> So what were the plans for these two queries?

OK,I would ask him to send them.

> Also, has this table been
> "vacuum analyzed"?
>

Yes,his another problem was solved by "vacuum analyze".
 
> > I got the following output from him.
> > explain select * from t_hoge where tel like '012%';
> > 	Index Scan using t_hoge_ix3 on t_hoge  (cost=0.00..0.23 rows=1981
> > 	width=676)
> 
> > explain select * from t_hoge where tel like '012-3%';
> > 	Index Scan using t_hoge_ix3 on t_hoge  (cost=0.00..0.00 rows=1981
> > 	width=676)
> 
> > In fact,count(*) is 342323 and 114741 respectively.
> 
> > The first problem is that estimated cost is too low.
> > It seems that the index selectivity of '012-3%' = the index
> > selectivity of '012%' / (256*256),right ? 
> > If so,does it give more practical estimation than before ?
> > It doesn't correspond to rows information either.
> 
> The rows number is fairly bogus (because it's coming from application of
> eqsel, which is not the right thing; perhaps someday LIKE should have
> its very own selectivity estimation function).  But the cost estimate
> is driven by the estimated selectivity of
> 	tel >= '012-3' AND tel < '012-4'
> and it would be nice to think that we have some handle on that.
>

Shouldn't rows number and cost estimate correspond in this case ?
For example,the following query would return same row numbers.
	select * from t_hoge where tel = '012';
And the cost estimate is probably > 1000.
Is it good that the cost estimate for "tel like '012%'" is much smaller
than " tel = '012' " ?

PostgreSQL's selectivity doesn't mean a pure probabilty.
For example,for int4 type the pure '=' probabity is pow(2,-32).
Is current cost estimate for " tel>=val1 and tel <val2'" is effective
for narrow range of (val1,val2) ?  The range ('012-3','012-4')
is veeeery narrow in the vast char(5) space.

Regards.

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp

In response to

Responses

pgsql-hackers by date

Next:From: Brian HirtDate: 2000-04-07 00:30:37
Subject: Unique Key Violation 7.0 vs. 6.5.3
Previous:From: Tom LaneDate: 2000-04-06 21:42:31
Subject: Re: postgres crash on CURSORS

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group