Re: Partial vacuum versus pg_class.reltuples

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Partial vacuum versus pg_class.reltuples
Date: 2009-06-07 19:24:11
Message-ID: 17119.1244402651@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?

It's not great, but I think it's probably not catastrophic either.
Keep in mind that all we need from reltuples is that the ratio
reltuples/relpages be a reasonable estimate of the density of live
tuples, because what the planner actually uses is
GetRelationNumberOfBlocks() * reltuples / relpages.
So for example an append-only table isn't a big problem, even if it's
been quite a while since we updated reltuples and relpages.

There was some mention of having a partial vacuum extrapolate a value of
reltuples and update pg_class with that. I'm afraid that that could be
a seriously bad idea; because there is no very good reason to suppose
that the subset of recently-modified pages forms a good sample of the
whole table as far as live-tuple density goes.

[ thinks a bit and reads the code some more ... ] There is a
considerably safer alternative, which is to let ANALYZE update the
reltuples estimate based on the pages it sampled; which should be a
considerably less biased sample than the pages a partial vacuum would
have looked at. And we have already got the code doing that, either
in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
out to be partial.

So actually I think we are in better shape than I realized, so far as
the heap reltuples numbers go. The case that's a bit nasty is where
we are propagating the heap reltuples number to the index reltuples
number for a GIN index. (Remember this only matters for a partial
index.) As the code stands today, what we'll be propagating is the
reltuples estimate from the most recent ANALYZE, not the ANALYZE that
we might be about to conduct. This is not great; but considering that
we are completely ignoring the first-order problem of the partial index
predicate's selectivity, quibbling about a second-order effect like the
estimate being out of date is pretty pointless.

> Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

You could use VACUUM FREEZE, for instance.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2009-06-07 19:25:59 Re: pg_migrator issue with contrib
Previous Message Josh Berkus 2009-06-07 19:18:49 Re: pg_migrator issue with contrib