Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
Date: 2022-02-02 20:38:51
Message-ID: CAH2-WznkSQ3HsCi1qn3-u6F551Y6CVypT2v950v-CTt0bqwBxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 1, 2022 at 3:02 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> If we wanted a more current estimate for the number of tuples in a
> relation then we could use reltuples / relpages *
> RelationGetNumberOfBlocks(r). However, I still don't see why an
> INSERT driven auto-vacuums are a particularly special case. ANALYZE
> updating the reltuples estimate had an effect on when auto-vacuum
> would trigger for tables that generally grow in the number of live
> tuples but previously only (i.e before insert vacuums existed)
> received auto-vacuum attention due to UPDATEs/DELETEs.

Sorry for the delay in my response; I've been travelling.

I admit that I jumped the gun here -- I now believe that it's
operating as originally designed. And that the design is reasonable.
It couldn't hurt to describe the design in a little more detail in the
user docs, though.

> Nothing there seems to indicate the scale is based on the historical
> table size when the table was last vacuumed/analyzed, so you could
> claim that the 3 usages of relpages when deciding if the table should
> be vacuumed and/or analyzed are all wrong and should take into account
> RelationGetNumberOfBlocks too.

The route of my confusion might interest you. vacuumlazy.c's call to
vac_update_relstats() provides relpages and reltuples values that are
very accurate relative to the VACUUM operations view of things
(relative to its OldestXmin, which uses the size of the table at the
start of the VACUUM, not the end). When the vac_update_relstats() call
is made, the same accurate-relative-toOldestXmin values might actually
*already* be out of date relative to the physical table as it is at
the end of the same VACUUM.

The fact that these accurate values could be out of date like this is
practically inevitable for a certain kind of table. But that might not
matter at all, if they were later *interpreted* in a way that took
this into account later on -- context matters.

For some reason I made the leap to thinking that everybody else
believed the same thing, too, and that the intention with the design
of the insert-driven autovacuum stuff was to capture that. But that's
just not the case, at all. I apologize for the confusion.

BTW, this situation with the relstats only being accurate "relative to
the last VACUUM's OldestXmin" is *especially* important with
new_dead_tuples values, which are basically recently dead tuples
relative to OldestXmin. In the common case where there are very few
recently dead tuples, we have an incredibly distorted "sample" of dead
tuples (it tells us much more about VACUUM implementation details than
the truth of what's going on in the table). So that also recommends
"relativistically interpreting" the values later on. This specific
issue has even less to do with autovacuum_vacuum_scale_factor than the
main point, of course. I agree with you that the insert-driven stuff
isn't a special case.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-02 20:49:48 Re: CREATEROLE and role ownership hierarchies
Previous Message Mark Dilger 2022-02-02 20:23:26 Re: CREATEROLE and role ownership hierarchies