Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> The column 'zip' is of type text. As such, indices will not be used except
> in the case when the where clause is WHERE zip ~ '^<text>' for btree
Uh ... nonsense.
> On Tue, 29 May 2001, Marc G. Fournier wrote:
>> globalmatch=# vacuum verbose analyze locations;
>> NOTICE: --Relation locations--
>> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
>> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
>> NOTICE: QUERY PLAN:
>> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)
Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
rows, which is more than enough to drive it to a sequential scan
(with an average of more than three matched rows on every page of the
relation, there'd be no I/O savings at all from consulting the index).
Since the real number of matches is only 1, this estimate is obviously
way off. In 7.1 the estimate is being driven by the frequency of the
most common value in the column --- what is the most common value?
If you're lucky, the most common value is a dummy (empty string, maybe)
that you could replace by NULL with a few simple changes in application
logic. 7.1 is smart enough to distinguish NULL from real data values
in its estimates. If you're not lucky, there really are a few values
that are far more common than average, in which case you're stuck unless
you want to run development sources. Current sources should do a lot
better on that kind of data distribution.
regards, tom lane
In response to
pgsql-hackers by date
|Next:||From: Oleg Bartunov||Date: 2001-05-29 14:12:20|
|Subject: haskeytype and index_formtuple|
|Previous:||From: Hannu Krosing||Date: 2001-05-29 13:28:54|
|Subject: problems setting shared memory on linux|