From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |
Date: | 2023-01-18 21:02:15 |
Message-ID: | CAH2-Wz=cJYdFww3FifTrLUYRwMzAVPVFuCZ0RcfMnibR94Rqng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 18, 2023 at 12:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't know enough about the specifics of how this works to have an
> intelligent opinion about how likely these particular ideas are to
> work out. However, I think it's risky to look at estimates and try to
> infer whether they are reliable. It's too easy to be wrong. What we
> really want to do is anchor our estimates to some data source that we
> know we can trust absolutely. If you trust possibly-bad data less, it
> screws up your estimates more slowly, but it still screws them up.
Some of what I'm proposing arguably amounts to deliberately adding a
bias. But that's not an unreasonable thing in itself. I think of it as
related to the bias-variance tradeoff, which is a concept that comes
up a lot in machine learning and statistical inference.
We can afford to be quite imprecise at times, especially if we choose
a bias that we know has much less potential to do us harm -- some
mistakes hurt much more than others. We cannot afford to ever be
dramatically wrong, though -- especially in the direction of vacuuming
less often.
Besides, there is something that we *can* place a relatively high
degree of trust in that will still be in the loop here: VACUUM itself.
If VACUUM runs then it'll call pgstat_report_vacuum(), which will set
the record straight in the event of over estimating dead tuples. To
some degree the problem of over estimating dead tuples is
self-limiting.
> If Andres is correct that what really matter is the number of pages
> we're going to have to dirty, we could abandon counting dead tuples
> altogether and just count not-all-visible pages in the VM map.
That's what matters most from a cost point of view IMV. So it's a big
part of the overall picture, but not everything. It tells us
relatively little about the benefits, except perhaps when most pages
are all-visible.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2023-01-18 21:05:51 | Re: Extracting cross-version-upgrade knowledge from buildfarm client |
Previous Message | Mark Dilger | 2023-01-18 20:58:26 | Re: Non-superuser subscription owners |