Skip site navigation (1) Skip section navigation (2)

Re: appendum: Re: *really* simple select doesn't use indices ...

From: The Hermit Hacker <scrappy(at)hub(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: appendum: Re: *really* simple select doesn't use indices ...
Date: 2001-05-29 14:13:07
Message-ID: Pine.BSF.4.33.0105291110360.82504-100000@mobile.hub.org (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, 29 May 2001, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > The column 'zip' is of type text. As such, indices will not be used except
> > in the case when the where clause is WHERE zip ~ '^<text>' for btree
> > indices.
>
> Uh ... nonsense.

Oh good, I was worried there for a sec ... :)

> > On Tue, 29 May 2001, Marc G. Fournier wrote:
> >> globalmatch=# vacuum verbose analyze locations;
> >> NOTICE:  --Relation locations--
> >> NOTICE:  Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
>
> >> globalmatch=# explain SELECT location from locations WHERE zip = '80012';
> >> NOTICE:  QUERY PLAN:
> >>
> >> Seq Scan on locations  (cost=0.00..2939.64 rows=4217 width=16)
>
> Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
> rows, which is more than enough to drive it to a sequential scan
> (with an average of more than three matched rows on every page of the
> relation, there'd be no I/O savings at all from consulting the index).
>
> Since the real number of matches is only 1, this estimate is obviously
> way off.  In 7.1 the estimate is being driven by the frequency of the
> most common value in the column --- what is the most common value? If
> you're lucky, the most common value is a dummy (empty string, maybe)
> that you could replace by NULL with a few simple changes in
> application logic.  7.1 is smart enough to distinguish NULL from real
> data values in its estimates.  If you're not lucky, there really are a
> few values that are far more common than average, in which case you're
> stuck unless you want to run development sources.  Current sources
> should do a lot better on that kind of data distribution.

Hit it right on the mark:

  zip  |  cnt
-------+-------
       | 81403
 00210 |     1
 00211 |     1

Will look at the code and see what I can do abuot that NULL issue ...
thanks :)


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2001-05-29 14:13:16
Subject: Re: /contrib/unixdate: Broke in cvs tip.
Previous:From: Oleg BartunovDate: 2001-05-29 14:12:20
Subject: haskeytype and index_formtuple

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group