From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Partial vacuum versus pg_class.reltuples |
Date: | 2009-06-07 20:08:32 |
Message-ID: | 603c8f070906071308x2d36d23fm6b29c8236539bc0a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 7, 2009 at 3:24 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Am I wrong to be frightened by the implications of updating this value
>> only once in a blue moon?
>
> It's not great, but I think it's probably not catastrophic either.
> Keep in mind that all we need from reltuples is that the ratio
> reltuples/relpages be a reasonable estimate of the density of live
> tuples, because what the planner actually uses is
> GetRelationNumberOfBlocks() * reltuples / relpages.
> So for example an append-only table isn't a big problem, even if it's
> been quite a while since we updated reltuples and relpages.
My first reaction was to be relieved by this explanation, but on
further thought I'm not sure I was right to be relieved. The
frequency of anti-wraparound vacuums is so low that it doesn't seem
inconceivable that this ratio could change considerably on a far
shorter time scale. (For me, at least in some cases, it's less than
the frequency with which I dump+reload for maintenance reasons like OS
upgrade, PG upgrade, HW upgrade...)
> There was some mention of having a partial vacuum extrapolate a value of
> reltuples and update pg_class with that. I'm afraid that that could be
> a seriously bad idea; because there is no very good reason to suppose
> that the subset of recently-modified pages forms a good sample of the
> whole table as far as live-tuple density goes.
I think you're right about that.
> [ thinks a bit and reads the code some more ... ] There is a
> considerably safer alternative, which is to let ANALYZE update the
> reltuples estimate based on the pages it sampled; which should be a
> considerably less biased sample than the pages a partial vacuum would
> have looked at. And we have already got the code doing that, either
> in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
> out to be partial.
I'm confused with how this squares with the previous discussion - I
thought you observed a case where this wasn't happening.
> So actually I think we are in better shape than I realized, so far as
> the heap reltuples numbers go. The case that's a bit nasty is where
> we are propagating the heap reltuples number to the index reltuples
> number for a GIN index. (Remember this only matters for a partial
> index.) As the code stands today, what we'll be propagating is the
> reltuples estimate from the most recent ANALYZE, not the ANALYZE that
> we might be about to conduct. This is not great; but considering that
> we are completely ignoring the first-order problem of the partial index
> predicate's selectivity, quibbling about a second-order effect like the
> estimate being out of date is pretty pointless.
Being one ANALYZE behind shouldn't be too bad.
>> Do we have any reasonable manual way of forcing
>> VACUUM to scan the entire heap?
>
> You could use VACUUM FREEZE, for instance.
That'll generate a fair amount of I/O.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-06-07 20:19:30 | Re: Partial vacuum versus pg_class.reltuples |
Previous Message | Tom Lane | 2009-06-07 19:28:30 | Re: INTERVAL SECOND limited to 59 seconds? |