Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Date: 2021-12-07 02:11:01
Message-ID: CA+TgmoZgyUBGXR6VszPJcx5dcdYvd6biOQbBeW4FhGGJs0JbDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 6, 2021 at 8:14 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Suppose we believe that not-all-visible pages have 20 LP_DEAD items on
> average, and they turn out to only have 3 or 5. Theoretically we've
> done the wrong thing by launching autovacuum workers sooner -- we
> introduce bias. But we also have lower variance over time, which might
> make it worth it. I also think that it might not really matter at all.
> It's no great tragedy if we clean up and set pages all-visible in the
> visibility map a little earlier on average. It might even be a
> positive thing.

This doesn't seem convincing. Launching autovacuum too soon surely has
costs that someone might not want to pay. Clearly in the degenerate
case where we always autovacuum every table every time an autovacuum
worker is launched, we have gone insane. So arbitrarily large moves in
that direction can't be viewed as unproblematic.

> The fact that the user expresses the dead-tuple-wise threshold using
> autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is
> based on some pretty iffy assumptions. Even if we greatly overestimate
> dead tuples with the new algorithm, we're only doing so under
> circumstances that might have caused
> autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker
> anyway. Just setting the visibility map bit has considerable value.

Now, on the other hand, I *most definitely* think
autovacuum_vacuum_scale_factor is hogwash. Everything I've seen
indicates that, while you do want to wait for a larger number of dead
tuples in a large table than in a small one, it's sublinear. I don't
know whether it's proportional to sqrt(table_size) or table_size^(1/3)
or lg(table_size) or table_size^(0.729837166538), but just plain old
table_size is definitely not it.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-12-07 02:36:12 Re: row filtering for logical replication
Previous Message Masahiko Sawada 2021-12-07 01:24:14 Re: parallel vacuum comments