Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-17 22:11:07
Message-ID: CA+TgmoY7hnNnCZUcaeznbEu1FwS87v6zwjNzjjVjbskdz9ff9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 17, 2023 at 3:08 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> If you assume that there is chronic undercounting of dead tuples
> (which I think is very common), ...

Why do you think that?

> How many dead heap-only tuples are equivalent to one LP_DEAD item?
> What about page-level concentrations, and the implication for
> line-pointer bloat? I don't have a good answer to any of these
> questions myself.

Seems a bit pessimistic. If we had unlimited resources and all
operations were infinitely fast, the optimal strategy would be to
vacuum after every insert, update, or delete. But in reality, that
would be prohibitively expensive, so we're making a trade-off.
Batching together cleanup for many table modifications reduces the
amortized cost of cleaning up after one such operation very
considerably. That's critical. But if we batch too much together, then
the actual cleanup doesn't happen soon enough to keep us out of
trouble.

If we had an oracle that could provide us with perfect information,
we'd ask it, among other things, how much work will be required to
vacuum right now, and how much benefit would we get out of doing so.
The dead tuple count is related to the first question. It's not a
direct, linear relationship, but it's not completely unrelated,
either. Maybe we could refine the estimates by gathering more or
different statistics than we do now, but ultimately it's always going
to be a trade-off between getting the work done sooner (and thus maybe
preventing table growth or a wraparound shutdown) and being able to do
more work at once (and thus being more efficient). The current system
set of counters predates HOT and the visibility map, so it's not
surprising if needs updating, but if you're argue that the whole
concept is just garbage, I think that's an overreaction.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-17 22:13:58 Re: pgsql: Doc: add XML ID attributes to <sectN> and <varlistentry> tags.
Previous Message Peter Geoghegan 2023-01-17 22:03:54 Re: Update comments in multixact.c