Re: Eagerly scan all-visible pages to amortize aggressive vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date: 2025-01-24 14:15:28
Message-ID: CAAKRu_aShxhT0RWnFz7+jFNepfheknj3KW2XhiErJiedt5ge=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2025 at 12:16 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2025 at 5:48 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Circling back to benchmarking, I've been running the most adversarial
> > benchmarks I could devise and can share a bit of what I've found.
> >
> > I created a "hot tail" benchmark where 16 clients insert some data and
> > then update some data older than what they just inserted but still
> > towards the end of the relation. The adversarial part is that I bulk
> > delete all the data older than X hours where X hours is always after
> > the data is eligible to be frozen but before it would be aggressively
> > vacuumed.
> >
> > That means that there are a bunch of pages that will never be frozen
> > on master but are frozen with the patch -- wasting vacuum resources. I
> > tuned vacuum_freeze_min_age and vacuum_freeze_table_age and picked the
> > DELETE window to specifically have this behavior.
> >
> > With my patch, I do see a 15-20% increase in the total time spent
> > vacuuming over the course of the multi-hour benchmark. (I only see a
> > 1% increase in the total WAL volume, though.)
>
> How much time is that in absolute terms? If the benchmark runs for 3
> hours and during that time we have 1 autovacuum worker active for 30
> minutes out of those 3 hours, that is different than if we have 5
> autovacuum workers active nearly all the time. Or, maybe a clearer way
> to put it, what percentage of the total work was the VACUUM work? If
> the total work was $100 and the VACUUM work accounted for $80 of that,
> then a 15-20% increase is pretty significant; if the total work was
> $100 and the VACUUM work accounted for $5 of that, then a 15-20%
> increase matters a lot less.

So, in this case, there is only one table in question, so 1 autovacuum
worker (and up to 2 maintenance parallel workers for index vacuuming).
The duration I provided is just the absolute duration from start of
vacuum to finish -- not considering the amount of time each parallel
worker may have been working (also it includes time spent delaying).
The benchmark ran for 2.8 hours. I configured vacuum to run
frequently. In this case, master spent 47% of the total time vacuuming
and the patch spent 56%.

There was a fair amount of run-to-run variance which is down to vacuum
and checkpoint timing. Andres suggested off-list that I rerun the
benchmarks with checksums and FPIs disabled to remove some of this
variation. That, of course, won't give accurate total time numbers but
it should make the proportional increase more stable. I'll share some
results once I've done this.

> But tentatively I'm inclined to say this is fine.

Inherent in frontloading work is wasting it if it turns out the work
is unneeded. Unneeded work is from one of two sources 1) we failed to
freeze the page or 2) we succeed in freezing the page but then the
page is unfrozen before the next aggressive vacuum. Avoiding 1 would
require knowledge about the distribution of page ages throughout the
relation that we decided was too expensive to get and store. Avoiding
2 would require prescience about the future of the workload. We found
building and storing a model to make predictions like that too
complicated, error-prone, and expensive.

I think your algorithm is the best we can do if we won't do either 1
or 2. If this general kind of algorithm is the best we can do, then
the only levers we have are changing the caps for success and failure
or changing when we try to eager scan. I suspect the latter won't make
much of a difference (i.e. some ways will work better for some
workloads and worse for others). For the former, we have choices about
what we make configurable (e.g. success, failure, region size). As for
choosing the defaults, I did some experimentation, and it went pretty
much as expected -- raising the failure cap or success cap results in
more unneeded work and lowering it results in less. There are some
workloads where there is a perfect point which results in the least
unneeded work for the most benefit -- but that point is totally
different for different workloads and configurations of Postgres.

> Some of the work
> that VACUUM is doing is probably work that otherwise would have needed
> to happen in the foreground. For instance, the speedup in DELETEs that
> you observed might be not only because the pages are cached but
> perhaps also because DELETE doesn't need to do any non-trivial
> visibility checks. Also, users don't have to anti-optimize their
> configuration settings for their workload as you did when constructing
> the adversarial case.

Yes, like I hope users won't tune their vacuum_freeze_min_age very low
if they are going to delete all data older than an hour. The
benchmarks were mostly to look for unexpected interactions -- of which
the DELETE performance was one. But other things, like a massive
increase in WAL volume did not happen.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Benoit Lobréau 2025-01-24 14:20:19 Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
Previous Message Andrew Dunstan 2025-01-24 14:14:59 Re: why -Fdance archive format option works with ./pg_restore but not with ./pg_dump?