From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: 7.0 like selectivity |
Date: | 2000-04-06 15:29:28 |
Message-ID: | 27244.955034968@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> For the query
> select hoge_cd,shimeinn,tel
> from t_hoge
> where shimeinn like 'imag%'
> and tel like '012%'
> order by hoge_cd
> limit 100;
> 64 rows returned immediately.
> And for the query
> select hoge_cd,shimeinn,tel
> from t_hoge
> where shimeinn like 'imag%'
> and tel like '012-3%'
> order by hoge_cd
> limit 100;
> 24 rows returned after waiting 8 minutes.
So what were the plans for these two queries? Also, has this table been
"vacuum analyzed"?
> I got the following output from him.
> explain select * from t_hoge where tel like '012%';
> Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.23 rows=1981
> width=676)
> explain select * from t_hoge where tel like '012-3%';
> Index Scan using t_hoge_ix3 on t_hoge (cost=0.00..0.00 rows=1981
> width=676)
> In fact,count(*) is 342323 and 114741 respectively.
> The first problem is that estimated cost is too low.
> It seems that the index selectivity of '012-3%' = the index
> selectivity of '012%' / (256*256),right ?
> If so,does it give more practical estimation than before ?
> It doesn't correspond to rows information either.
The rows number is fairly bogus (because it's coming from application of
eqsel, which is not the right thing; perhaps someday LIKE should have
its very own selectivity estimation function). But the cost estimate
is driven by the estimated selectivity of
tel >= '012-3' AND tel < '012-4'
and it would be nice to think that we have some handle on that.
It could be that the thing is getting fooled by a very non-uniform
distribution of telephone numbers. You indicate that most of the
numbers in the DB begin with '012', but if there are a small number
that begin with digits as high as 9, the selectivity estimates would
be pretty bad.
> In reality, * shimeinn like 'imag%' * is much more restrictive
> than * tel like '012-3%' *. However I couldn't think of the
> way to foresee which is more restrictive. Now I doubt whether
> we have enough information to estimate LIKE selectivity
> correctly.
No, we don't, because we only keep track of the min and max values
in each column and assume that the data is uniformly distributed
between those limits. Perhaps someday we could keep a histogram
instead --- but VACUUM ANALYZE would get a lot slower and more
complicated ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-04-06 15:31:40 | Re: Book and TEMP vs. TEMPORARY |
Previous Message | Bruce Momjian | 2000-04-06 15:12:45 | Book and TEMP vs. TEMPORARY |