Statistics on key distribution (was: Re: order by and index path)

From: jwieck(at)debis(dot)com (Jan Wieck)
To: lockhart(at)alumni(dot)caltech(dot)edu (Thomas G(dot) Lockhart)
Cc: jwieck(at)debis(dot)com, andreas(dot)zeugswetter(at)telecom(dot)at, hackers(at)postgreSQL(dot)org
Subject: Statistics on key distribution (was: Re: order by and index path)
Date: 1998-10-15 15:03:51
Message-ID: m0zTowS-000EBRC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > We do not have this statistical information. So the whole
> > thing is at this time academic.
>
> I recall that Commercial Ingres made the assumption that one row (or 1%
> of rows? My memory of Ingres is fading :) would be returned from a
> qualified query if no statistics were available to suggest otherwise.
>
> It did collect statistics on data distribution to try to help make those
> optimizer choices.
>
> It may be reasonable to assume that if there is an index, then using it
> with any qualified query would be a win. Since the alternative is to
> decide to _not_ use an index, a decision for which we have no support
> with existing statistics.

It may be also reasonable to collect statistic information
and use that to quantify the cost of an index scan.

The vacuum cleaner scans all indices on a relation vacuum'd
completely. And at that time it already knows the number of
pages and tuples in the heap relation (has that in the
vcrelstats).

Based on this it could decide to take every n'th index tuple
while scanning and drop them somewhere where other backends
can find them. This would be the statistical information
needed by the optimizer to estimate the real cost of an index
scan. It is only of interest for big tables, where hopping
from block to block will make an index scan a looser against
a seqscan in a many row matching scan. So it's up to the
optimizer do decide based on the # of pages if statistical
information is really required for cost calculation.

Having the final indexqual along with the statistical
information it will be a little tricky to figure out how many
rows it might return, but not impossible.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck(at)debis(dot)com (Jan Wieck) #

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 1998-10-15 15:05:31 small bug in src/interfaces/ecpg/lib/Makefile.in
Previous Message Brook Milligan 1998-10-15 14:48:45 Re: [PATCHES] TCL/TK configuration clean-up patches