Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Helmberger <fh(at)25th-floor(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum
Date: 2011-05-25 16:41:39
Message-ID: 22497.1306341699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, May 25, 2011 at 11:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 2. Revise the vacuum code so that it doesn't skip updating the pg_class
>> entries. We could have it count the number of pages it skipped, rather
>> than just keeping a bool, and then scale up the rel_tuples count to be
>> approximately right by assuming the skipped pages have tuple density
>> similar to the scanned ones.

> This approach doesn't seem like a good idea to me. The skipped
> portions of the table are the ones that haven't been modified in a
> while, so this is or embeds an assumption that the tuples in the hot
> and cold portions of the table are the same size. That might be true,
> but it isn't hard to think of cases where it might not be. There
> could also very easily be sampling error, because it's easy to imagine
> a situation where 99% of the table is getting skipped.

Yeah, I had been thinking about the latter point. We could be
conservative and just keep the reported tuple density the same (ie,
update relpages to the new correct value, while setting reltuples so
that the density ratio doesn't change). But that has its own problems
when the table contents *do* change. What I'm currently imagining is
to do a smoothed moving average, where we factor in the new density
estimate with a weight dependent on the percentage of the table we did
scan. That is, the calculation goes something like

old_density = old_reltuples / old_relpages
new_density = counted_tuples / scanned_pages
reliability = scanned_pages / new_relpages
updated_density = old_density + (new_density - old_density) * reliability
new_reltuples = updated_density * new_relpages

We could slow the moving-average convergence even further when
reliability is small; for instance if you were really paranoid you might
want to use the square of reliability in line 4. That might be
overdoing it, though.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Robert Haas 2011-05-25 16:54:28 Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum
Previous Message Kevin Grittner 2011-05-25 16:37:24 Re: [ADMIN] pg_class reltuples/relpages not updated by autovacuum/vacuum

Browse pgsql-hackers by date

  From Date Subject
Next Message panam 2011-05-25 16:42:29 Re: Hash Anti Join performance degradation
Previous Message Alvaro Herrera 2011-05-25 16:41:18 Re: [BUGS] BUG #6034: pg_upgrade fails when it should not.