Skip site navigation (1) Skip section navigation (2)

Stopgap solution for table-size-estimate updating problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Stopgap solution for table-size-estimate updating problem
Date: 2004-11-26 22:55:56
Message-ID: 19915.1101509756@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
There's been some previous discussion of getting rid of the pg_class
columns relpages and reltuples, in favor of having the planner check the
current relation block count directly (RelationGetNumberOfBlocks) and
extrapolate the current tuple count based on the most recently measured
tuples-per-page density.  A couple of past threads are
http://archives.postgresql.org/pgsql-performance/2004-10/msg00367.php
http://archives.postgresql.org/pgsql-general/2004-08/msg01422.php
and the point came up again today:
http://archives.postgresql.org/pgsql-performance/2004-11/msg00401.php
where we were again reminded of the problems incurred by obsolete
estimates.

It occurs to me that we could get most of the bang for the buck without
making any incompatible changes: just leave the existing fields in place
but make the planner use reltuples-divided-by-relpages as the density
estimate.  That is, in place of where we have

	rel->pages = relation->rd_rel->relpages;
	rel->tuples = relation->rd_rel->reltuples;

in plancat.c, just do

	rel->pages = RelationGetNumberOfBlocks(relation);
	if (relation->rd_rel->relpages > 0)
	    density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
	else
	    density = some_default_estimate;
	rel->tuples = round(rel->pages * density);

In addition to this we'd perhaps want to hack VACUUM so that when the
table is empty, it doesn't simply zero out relpages/reltuples, but
somehow preserves the previous density value so we don't have to fall
back to the default density estimate.  (This of course assumes that we
will refill the table with a density roughly similar to the last meaured
density; which might be wrong but it's still better than just using a
default, I think.)  One way to do that is to set relpages = zero
(truthfully) but set reltuples to the previously estimated density
(we can do this because it's already a float field).  It might look a
little funny to have nonzero reltuples when relpages is zero, but I
think it wouldn't break anything.  Then the above logic becomes

	rel->pages = RelationGetNumberOfBlocks(relation);
	if (relation->rd_rel->relpages > 0)
	    density = relation->rd_rel->reltuples / relation->rd_rel->relpages;
	else if (relation->rd_rel->reltuples > 0)  /* already a density */
	    density = relation->rd_rel->reltuples;
	else
	    density = some_default_estimate;
	rel->tuples = round(rel->pages * density);

A variant of this is to set reltuples = density, relpages = 1 instead
of 0, which makes the relpages value a lie but would be even less likely
to confuse client-side code.

Comments?  Does this seem like something reasonable to do for 8.0?

			regards, tom lane

pgsql-hackers by date

Next:From: Zeugswetter Andreas DAZ SDDate: 2004-11-27 00:14:50
Subject: Re: Stopgap solution for table-size-estimate updating problem
Previous:From: Tom LaneDate: 2004-11-26 21:19:18
Subject: Re: problem installing 8.0.0beta5 on OS X 10.3

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group