Re: appendum: Re: *really* simple select doesn't use indices ...

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: appendum: Re: *really* simple select doesn't use indices ...
Date: 2001-05-29 14:13:07
Message-ID: Pine.BSF.4.33.0105291110360.82504-100000@mobile.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 29 May 2001, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > The column 'zip' is of type text. As such, indices will not be used except
> > in the case when the where clause is WHERE zip ~ '^<text>' for btree
> > indices.
>
> Uh ... nonsense.

Oh good, I was worried there for a sec ... :)

> > On Tue, 29 May 2001, Marc G. Fournier wrote:
> >> globalmatch=# vacuum verbose analyze locations;
> >> NOTICE: --Relation locations--
> >> NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
>
> >> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> >> NOTICE: QUERY PLAN:
> >>
> >> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)
>
> Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
> rows, which is more than enough to drive it to a sequential scan
> (with an average of more than three matched rows on every page of the
> relation, there'd be no I/O savings at all from consulting the index).
>
> Since the real number of matches is only 1, this estimate is obviously
> way off. In 7.1 the estimate is being driven by the frequency of the
> most common value in the column --- what is the most common value? If
> you're lucky, the most common value is a dummy (empty string, maybe)
> that you could replace by NULL with a few simple changes in
> application logic. 7.1 is smart enough to distinguish NULL from real
> data values in its estimates. If you're not lucky, there really are a
> few values that are far more common than average, in which case you're
> stuck unless you want to run development sources. Current sources
> should do a lot better on that kind of data distribution.

Hit it right on the mark:

zip | cnt
-------+-------
| 81403
00210 | 1
00211 | 1

Will look at the code and see what I can do abuot that NULL issue ...
thanks :)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-05-29 14:13:16 Re: /contrib/unixdate: Broke in cvs tip.
Previous Message Oleg Bartunov 2001-05-29 14:12:20 haskeytype and index_formtuple