Re: Weird indices

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Joseph Shraibman <jks(at)selectacast(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Weird indices
Date: 2001-02-20 07:04:14
Message-ID: Pine.BSF.4.21.0102192252490.683-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 19 Feb 2001, Joseph Shraibman wrote:

> Stephan Szabo wrote:
> >
> > On Mon, 19 Feb 2001, Joseph Shraibman wrote:
> >
> > > > Of course, if the 10113-match estimate is wildly off (as it was in this
> > > > case), then the wrong plan may be chosen. But it IS NOT CORRECT to
> > > > suppose that indexscans always beat seqscans. The planner's job would
> > > > be a lot easier if that were true.
> > >
> > > Can't postgres do the index lookup first and find out there are only a
> > > few tuples that might match?
> >
> > Well, theoretically the estimate is supposed to match reality. There are
> > still some cases where there isn't enough information kept to allow that
> > to be true (the case where there is a single very common non-NULL value is
> > one such case).
>
> But the index should give the upper bounds of the query and show that
> this that this query is not going to return 10113 rows. It appeared to
> work like this in my query. I don't really know what his database is
> like or how many times it was updated since he last vacuumed, but it
> seems that postgres should have been able to tell that query would have
> returned much less than 10113 entries.

The problem is that the stats that are kept are woefully inadequate for
these cases. The problem is basically that IIRC it's taking the
most common value's # of appearances and using a fraction of that
as the estimate for any other value. This is not a really meaningful
estimate of the number of rows to return and there's been talk of how
to add more detailed statistics to make this number more meaningful.
And btree indexes really aren't all that good for getting the exact
number of entries - you'd be better off keeping that number somewhere
else, but MVCC would probably make that difficult, since I'd guess
that the different versions of rows would each have index entries
and not all of them apply to your transaction - which is why I think
it goes to the heap to test for visibility.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jean-Christophe Boggio 2001-02-20 07:25:11 Is that VACUUM normal ?
Previous Message Konstantinos Agouros 2001-02-20 06:36:12 -F and perl again