Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?
Date: 2022-01-30 17:00:50
Message-ID: 20220130170050.GS23027@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 27, 2022 at 01:59:38PM -0800, Peter Geoghegan wrote:
> On Thu, Jan 27, 2022 at 12:20 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > Both VACUUM and ANALYZE update pg_class.reltuples. But this code seems
> > to assume that it's only something that VACUUM can ever do. Why
> > wouldn't we expect a plain ANALYZE to have actually been the last
> > thing to update pg_class.reltuples for an append-only table? Wouldn't
> > that lead to less frequent (perhaps infinitely less frequent)
> > vacuuming for an append-only table, relative to the documented
> > behavior of autovacuum_vacuum_insert_scale_factor?
>
> PgStat_StatTabEntry.inserts_since_vacuum will continue to grow and
> grow as more tuples are inserted, until VACUUM actually runs, no
> matter what. That largely explains why this bug was missed before now:
> it's inevitable that inserts_since_vacuum will become large at some
> point -- even large relative to a bogus scaled
> pg_class.reltuples-at-ANALYZE threshold (unless ANALYZE hasn't been
> run since the last VACUUM, in which case pg_class.reltuples will be at
> the expected value anyway). And so we'll eventually get to the point
> where so many unvacuumed inserted tuples have accumulated that an
> insert-driven autovacuum still takes place.

Maybe I'm missed your point, but I think this may not rise to the level of
being a "bug".

If we just vacuumed an insert-only table, and then insert some more, and
autoanalyze runs and updates reltuples, what's wrong with vac_ins_scale_factor
* reltuples + vac_ins_base_thresh ?

You're saying reltuples should be the number of tuples at the last vacuum
instead of the most recent value from either vacuum or analyze ?

It's true that the vacuum threshold may be hit later than if reltuples hadn't
been updated by ANALYZE. If that's what you're referring to, that's the
behavior of scale factor in general. If a table is growing in size at a
constant rate, analyze will run at decreasing frequency. With the default 20%
scale factor, it'll first run at 1.2x the table's initial size, then at 1.44
(not 1.4), then at 1.728 (not 1.6), then at 2.0736 (not 1.8). That's not
necessarily desirable, but it's not necessarily wrong, either. If your table
doubles in size, you might have to adjust these things. Maybe there should be
another knob allowing perfect, "geometric" (or other) frequency, but the
behavior is not new in this patch.

We talked about that here.
https://www.postgresql.org/message-id/flat/20200305172749.GK684%40telsasoft.com#edac59123843f9f8e1abbc2b570c76f1

With the default values, analyze happens after 10% growth, and vacuum happens
after 20% (which ends up being 22% of the initial table size). The goal of
this patch was to have inserts trigger autovacuum *at all*. This behavior may
be subtle or non-ideal, but not a problem? The opposite thing could also
happen - *vacuum* could update reltuples, causing the autoanalyze threshold to
be hit a bit later.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2022-01-30 17:27:43 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Tom Lane 2022-01-30 16:24:35 Re: [PATCH] nodeindexscan with reorder memory leak