Re: appendum: Re: *really* simple select doesn't use indices ...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: "Marc G(dot) Fournier" <scrappy(at)hub(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: appendum: Re: *really* simple select doesn't use indices ...
Date: 2001-05-29 13:54:47
Message-ID: 18366.991144487@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> The column 'zip' is of type text. As such, indices will not be used except
> in the case when the where clause is WHERE zip ~ '^<text>' for btree
> indices.

Uh ... nonsense.

> On Tue, 29 May 2001, Marc G. Fournier wrote:
>> globalmatch=# vacuum verbose analyze locations;
>> NOTICE: --Relation locations--
>> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.

>> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
>> NOTICE: QUERY PLAN:
>>
>> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
rows, which is more than enough to drive it to a sequential scan
(with an average of more than three matched rows on every page of the
relation, there'd be no I/O savings at all from consulting the index).

Since the real number of matches is only 1, this estimate is obviously
way off. In 7.1 the estimate is being driven by the frequency of the
most common value in the column --- what is the most common value?
If you're lucky, the most common value is a dummy (empty string, maybe)
that you could replace by NULL with a few simple changes in application
logic. 7.1 is smart enough to distinguish NULL from real data values
in its estimates. If you're not lucky, there really are a few values
that are far more common than average, in which case you're stuck unless
you want to run development sources. Current sources should do a lot
better on that kind of data distribution.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2001-05-29 14:12:20 haskeytype and index_formtuple
Previous Message Hannu Krosing 2001-05-29 13:28:54 problems setting shared memory on linux