Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Helmberger <fh(at)25th-floor(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum
Date: 2011-05-25 17:04:27
Message-ID: 22926.1306343067@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, May 25, 2011 at 12:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, I had been thinking about the latter point. We could be
>> conservative and just keep the reported tuple density the same (ie,
>> update relpages to the new correct value, while setting reltuples so
>> that the density ratio doesn't change). But that has its own problems
>> when the table contents *do* change. What I'm currently imagining is
>> to do a smoothed moving average, where we factor in the new density
>> estimate with a weight dependent on the percentage of the table we did
>> scan. That is, the calculation goes something like
>>
>> old_density = old_reltuples / old_relpages
>> new_density = counted_tuples / scanned_pages
>> reliability = scanned_pages / new_relpages
>> updated_density = old_density + (new_density - old_density) * reliability
>> new_reltuples = updated_density * new_relpages
>>
>> We could slow the moving-average convergence even further when
>> reliability is small; for instance if you were really paranoid you might
>> want to use the square of reliability in line 4. That might be
>> overdoing it, though.

> I don't know. That's maybe better, but I'd be willing to wager that
> in some cases it will just slow down the rate at which we converge to
> a completely incorrect value, while in other cases it'll fail to
> update the data when it really has changed.

[ shrug... ] When you don't have complete information, it's *always*
the case that you will sometimes make a mistake. That's not
justification for paralysis, especially not when the existing code is
demonstrably broken.

What occurs to me after thinking a bit more is that the existing tuple
density is likely to be only an estimate, too (one coming from the last
ANALYZE, which could very well have scanned even less of the table than
VACUUM did). So what I now think is that both VACUUM and ANALYZE ought
to use a calculation of the above form to arrive at a new value for
pg_class.reltuples. In both cases it would be pretty easy to track the
number of pages we looked at while counting tuples, so the same raw
information is available.

> I am wondering, though, why we're not just inserting a special-purpose
> hack for TOAST tables.

Because the problem is not specific to TOAST tables. As things
currently stand, we will accept the word of an ANALYZE as gospel even if
it scanned 1% of the table, and completely ignore the results from a
VACUUM even if it scanned 99% of the table. This is not sane.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Alvaro Herrera 2011-05-25 17:13:28 Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum
Previous Message Alvaro Herrera 2011-05-25 16:55:09 Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-25 17:06:08 Re: Proposal: Another attempt at vacuum improvements
Previous Message Heikki Linnakangas 2011-05-25 17:00:38 Re: Nested CASE-WHEN scoping