Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-17 22:56:16
Message-ID: CAH2-WznZekv0k+O7DXKnKq+EGTkmW8qA+HnHGF04H9kGNw0eQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 17, 2023 at 2:11 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jan 17, 2023 at 3:08 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > If you assume that there is chronic undercounting of dead tuples
> > (which I think is very common), ...
>
> Why do you think that?

For the reasons I gave about statistics, random sampling, the central
limit theorem. All that stuff. This matches the experience of Andres.
And is obviously the only explanation behind the reliance on
antiwraparound autovacuums for cleaning up bloat in larger OLTP
databases. It just fits: the dead tuples approach can sometimes be so
completely wrong that even an alternative triggering condition based
on something that is virtually unrelated to the thing we actually care
about can do much better in practice. Consistently, reliably, for a
given table/workload.

> > How many dead heap-only tuples are equivalent to one LP_DEAD item?
> > What about page-level concentrations, and the implication for
> > line-pointer bloat? I don't have a good answer to any of these
> > questions myself.
>
> Seems a bit pessimistic. If we had unlimited resources and all
> operations were infinitely fast, the optimal strategy would be to
> vacuum after every insert, update, or delete. But in reality, that
> would be prohibitively expensive, so we're making a trade-off.

To a large degree, that's my point. I don't know how to apply this
information, so having detailed information doesn't seem like the main
problem.

> If we had an oracle that could provide us with perfect information,
> we'd ask it, among other things, how much work will be required to
> vacuum right now, and how much benefit would we get out of doing so.

And then what would we do? What about costs?

Even if we were omniscient, we still wouldn't be omnipotent. We're
still subject to the laws of physics. VACUUM would still be something
that more or less works at the level of the whole table, or not at
all. So being omniscient seems kinda overrated to me. Adding more
information does not in general lead to better outcomes.

> The dead tuple count is related to the first question. It's not a
> direct, linear relationship, but it's not completely unrelated,
> either. Maybe we could refine the estimates by gathering more or
> different statistics than we do now, but ultimately it's always going
> to be a trade-off between getting the work done sooner (and thus maybe
> preventing table growth or a wraparound shutdown) and being able to do
> more work at once (and thus being more efficient). The current system
> set of counters predates HOT and the visibility map, so it's not
> surprising if needs updating, but if you're argue that the whole
> concept is just garbage, I think that's an overreaction.

What I'm arguing is that principally relying on any one thing is
garbage. If you have only one thing that creates pressure to VACUUM
then there can be a big impact whenever it turns out to be completely
wrong. Whereas if VACUUM can run because of (say) 3 moderate signals
taken together, then it's much less likely that we'll be completely
wrong. In general my emphasis is on avoiding disaster in all its
forms. Vacuuming somewhat early more often is perhaps suboptimal, but
far from a disaster. It's the kind of thing that we can manage.

By all means, let's make the dead tuples/dead items stuff less naive
(e.g. make it distinguish between LP_DEAD items and dead heap-only
tuples). But even then, we shouldn't continue to completely rely on it
in the way that we do right now. In other words, I'm fine with adding
more information that is more accurate as long as we don't continue to
make the mistake of not treating it kinda suspect, and certainly not
something to completely rely on if at all possible. In particular, we
need to think about both costs and benefits at all times.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-01-17 23:18:26 Re: Can we let extensions change their dumped catalog schemas?
Previous Message Jacob Champion 2023-01-17 22:52:13 Re: [EXTERNAL] Re: Support load balancing in libpq