Quick Links

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date:	2023-01-18 21:02:15
Message-ID:	CAH2-Wz=cJYdFww3FifTrLUYRwMzAVPVFuCZ0RcfMnibR94Rqng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 18, 2023 at 12:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't know enough about the specifics of how this works to have an
> intelligent opinion about how likely these particular ideas are to
> work out. However, I think it's risky to look at estimates and try to
> infer whether they are reliable. It's too easy to be wrong. What we
> really want to do is anchor our estimates to some data source that we
> know we can trust absolutely. If you trust possibly-bad data less, it
> screws up your estimates more slowly, but it still screws them up.

Some of what I'm proposing arguably amounts to deliberately adding a
bias. But that's not an unreasonable thing in itself. I think of it as
related to the bias-variance tradeoff, which is a concept that comes
up a lot in machine learning and statistical inference.

We can afford to be quite imprecise at times, especially if we choose
a bias that we know has much less potential to do us harm -- some
mistakes hurt much more than others. We cannot afford to ever be
dramatically wrong, though -- especially in the direction of vacuuming
less often.

Besides, there is something that we *can* place a relatively high
degree of trust in that will still be in the loop here: VACUUM itself.
If VACUUM runs then it'll call pgstat_report_vacuum(), which will set
the record straight in the event of over estimating dead tuples. To
some degree the problem of over estimating dead tuples is
self-limiting.

> If Andres is correct that what really matter is the number of pages
> we're going to have to dirty, we could abandon counting dead tuples
> altogether and just count not-all-visible pages in the VM map.

That's what matters most from a cost point of view IMV. So it's a big
part of the overall picture, but not everything. It tells us
relatively little about the benefits, except perhaps when most pages
are all-visible.

--
Peter Geoghegan

In response to

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation at 2023-01-18 20:43:48 from Robert Haas

Responses

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation at 2023-01-18 21:12:27 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2023-01-18 21:05:51	Re: Extracting cross-version-upgrade knowledge from buildfarm client
Previous Message	Mark Dilger	2023-01-18 20:58:26	Re: Non-superuser subscription owners