Re: Estimating costs (was Functional Indices)

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, kavoos <kavoos(at)issn(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Estimating costs (was Functional Indices)
Date: 2001-05-24 04:01:42
Message-ID: 20010524140142.A19798@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, May 23, 2001 at 10:40:38PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > I'm not sure how common this is (long runs in a foreign key column) and it's
> > probably not worth it in the general case. So, is there a column in
> > pg_statistic where I can twiddle the per-tuple index-scan cost?
>
> You could stick a phony value into the correlation datum.

Ah, that would do it. Would need to experiment. Is this in 7.1 or 7.2?

> > We'd get better results with partial indexes anyway I think.
>
> I'd like to see the partial-index support cranked up again, for sure.
> But how does that solve your problem? I don't see the connection.

Because the queries are of the form ID1 = 'xxxx' and ID2 is null. So,
improving the index scan would make it use the index for the first clause
and scan for the second (nulls don't appear in indicies, right?)

With a partial index on the ID2 is null clause it could scan that index and
look for tuples with match the first. As indicated on that paper linked to
from the documentation, it also gives a hints to the database where most of
the queries are likely to be directed at.

The first clause matches about 0.04% of rows, the second about 5%. Most of
the time it's fine but when you start summerising data so you need to scan
many of ID1 it decides to start using an sequential scan. Look at the estimates
for the sequential and index scan:

Seq Scan on dailycalls (cost=0.00..46352.20 rows=1630 width=139)
Index Scan using dailycalls_clid on dailycalls (cost=0.00..6279.75 rows=1630 width=139)

Yet I can tell you that empirical evidence suggests that the index scan is
at least 50 times faster than the sequential scan (10 seconds to less than
0.2 seconds).

Does the planner take into account that since the total size of the database
is less than the total amount of memory, a lot of the database is likely to
be cached?

Talking about statistics, is there anywhere that counts the number of times
an index has been used? So I can check to see if all my indicies are
worthwhile.

Thanks for listening,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org>
http://svana.org/kleptog/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlos Moreno 2001-05-24 04:16:31 Re: I want more Money (the data type, of course! :-))
Previous Message Niraj K. Patel 2001-05-24 03:42:22 PostgreSQL 7.03 Install fails on RedHat Linux 6.1