Re: Are statistics gathered on function indexes?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ray Ontko <rayo(at)ontko(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Are statistics gathered on function indexes?
Date: 2002-06-28 14:01:02
Message-ID: 3612.1025272862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Ray Ontko <rayo(at)ontko(dot)com> writes:
>> It's there already; what did you think was making the difference
>> between W% and WI% ?

> Yes, but the cost doesn't continue to decline if I make the LIKE
> more and more restrictive by going from WI% to WIL% to WILL%, etc.

Yes it does, if you have a large enough table. In most scenarios
the selectivity drops off fast enough with larger strings that you
hit the minimum estimate of 1 row pretty quickly; I suppose that's
what's happening with your case. Here's an example using the 7.2
regression-test database:

-- update stats
regression=# analyze road;
ANALYZE

-- now force planner to think "road" is much larger than it really is,
-- else we can't see the change in estimate beyond WI%
regression=# update pg_class set relpages = relpages * 10000,
regression-# reltuples = reltuples * 10000 where relname = 'road';
UPDATE 1

regression=# explain select * from only road where name like 'W%';
NOTICE: QUERY PLAN:

Seq Scan on road (cost=0.00..1444625.00 rows=764903 width=89)

EXPLAIN
regression=# explain select * from only road where name like 'WI%';
NOTICE: QUERY PLAN:

Index Scan using rix on road (cost=0.00..25007.80 rows=8406 width=89)

EXPLAIN
regression=# explain select * from only road where name like 'WIJ%';
NOTICE: QUERY PLAN:

Index Scan using rix on road (cost=0.00..277.04 rows=92 width=89)

EXPLAIN
regression=# explain select * from only road where name like 'WIJK%';
NOTICE: QUERY PLAN:

Index Scan using rix on road (cost=0.00..5.28 rows=1 width=89)

EXPLAIN
regression=# explain select * from only road where name like 'WIJKL%';
NOTICE: QUERY PLAN:

Index Scan using rix on road (cost=0.00..5.23 rows=1 width=89)

EXPLAIN
regression=# explain select * from only road where name like 'WIJKLM%';
NOTICE: QUERY PLAN:

Index Scan using rix on road (cost=0.00..5.23 rows=1 width=89)

EXPLAIN
regression=#

As you can see, the estimate drops off by about a factor of 90 per
added character. This is probably too much, but it's not that easy
to determine what the ratio ought to be. The critical code involved
in this is convert_string_to_scalar in backend/utils/adt/selfuncs.c;
the ratio per character is essentially the same as the character range
that it induces from the available values. Feel free to propose a
better implementation if you can think of one.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ray Ontko 2002-06-28 15:10:36 Re: Are statistics gathered on function indexes?
Previous Message Ray Ontko 2002-06-28 12:40:37 Re: Are statistics gathered on function indexes?