Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-14 03:39:41
Message-ID: CAH2-Wz=GahcN9qmZOmGYhPhDnG7P+3jL546AaE9nbMSR5_o-ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2023 at 6:09 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I don't think the split is right. There's too much in 0001 - it's basically
> introducing the terminology of 0002 already. Could you make it a much more
> minimal change?

Okay.

I thought that you might say that. I really just wanted to show you
how small the code footprint was for the more controversial part.

> IDK, it splits up anti-wraparound vacuums into different sub-kinds but doesn't
> allow to distinguish most of them from a plain autovacuum.
>
> Seems pretty easy to display the trigger from 0001 in
> autovac_report_activity()? You'd have to move the AutoVacType -> translated
> string mapping into a separate function. That seems like a good idea anyway,
> the current coding makes translators translate several largely identical
> strings that just differ in one part.

I'll look into it. It's on my TODO list now.

> seems quite confusing / non-principled. What's the logic behind the auto
> cancel threshold being 2 x freeze_max_age, except that when freeze_max_age is
> 1 billion, the cutoff is set to 1 billion? That just makes no sense to me.

Why is the default for autovacuum_freeze_max_age 200 million? I think
that it's because 200 million is 10% of 2 billion. And so it is for
this cap. 1 billion is 50% of 2 billion. It's just numerology. It
doesn't make sense to me either.

Of course it's not *completely* arbitrary. Obviously the final auto
cancel threshold needs to be at least somewhat greater than
freeze_max_age for any of this to make any sense at all. And, we
should ideally have a comfortable amount of slack to work with, so
that things like moderate (not pathological) autovacuum worker
starvation isn't likely to defeat our attempts at avoiding the
no-auto-cancel behavior for no good reason. Finally, once we get to a
certain table age it starts to seem like a bad idea to not be
conservative about auto cancellations.

I believe that we agree on most things when it comes to VACUUM, but
I'm pretty sure that we still disagree about the value in setting
autovacuum_freeze_max_age very high. I continue to believe that
setting it over a billion or so is just a bad idea. I'm mentioning
this only because it might give you some idea of where I'm coming from
-- in general I believe that age-based settings often have only very
weak relationships with what actually matters.

It might make sense to always give a small fixed amount of headroom
when autovacuum_freeze_max_age is set to very high values. Maybe just
5 million XIDs/MXIDs. That would probably be just as effective as
(say) 500 million in almost all cases. But then you have to accept
another magic number.

> I think this'd be a lot more readable if you introduced a separate variable
> for the "no-cancel" threshold, rather than overloading freeze_max_age with
> different meanings. And you should remove the confusing "lowering" of the
> cutoff. Maybe something like
>
> no_cancel_age = freeze_max_age;
> if (no_cancel_age < ANTIWRAPAROUND_MAX_AGE)
> {
> /* multiply by two, but make sure to not exceed ANTIWRAPAROUND_MAX_AGE */
> no_cancel_age = Min((uint32)ANTIWRAPAROUND_MAX_AGE, (uint32)no_cancel_age * 2);
> }

I'm happy to do it that way, but let's decide what the algorithm
itself should be first. Or let's explore it a bit more, at least.

> > The only mechanism that the patch changes is related to "prevent
> > auto-cancellation" behaviors -- which is now what the term
> > "antiwraparound" refers to.
>
> Not sure that redefining what a long-standing name refers to is helpful. It
> might be best to retire it and come up with new names.

I generally try to avoid bike-shedding, and naming things is the
ultimate source of bike shedding. I dread having to update the docs
for this stuff, too. The docs in this area (particularing "Routine
Vacuuming") are such a mess already. But perhaps you're right.

> 1) It regularly scares the crap out of users, even though it's normal. This
> is further confounded by failsafe autovacuums, where a scared reaction is
> appropriate, not being visible in pg_stat_activity.

The docs actually imply that when the system reaches the point of
entering xidStopLimit mode, you might get data corruption. Of course
that's not true (the entire point of xidStopLimit is to avoid that),
but apparently we like to keep users on their toes.

> I suspect that learning that "vacuum to prevent wraparound" isn't a
> problem, contributes to people later ignoring "must be vacuumed within ..."
> WARNINGS, which I've seen plenty times.

That point never occured to me, but it makes perfect intuitive sense
that users would behave that way. This phenomenon is sometimes called
alarm fatigue, It can be quite dangerous to warn people about
non-issues "out of an abundance of caution".

> 3) Autovacuums triggered by tuple thresholds persistently getting cancelled
> also regularly causes outages, and they make it more likely that an
> eventual age-based vacuum will take forever.

Really? Outages? I imagine that you'd have to be constantly hammering
the table with DDL before it could happen. That's possible, but it
seems relatively obvious that doing that is asking for trouble.
Whereas the Manta postmortem (and every similar case that I've
personally seen) involved very nasty interactions that happened due to
the way components interacted with a workload that wasn't like that.

Running DDL from a cron job or from the application may not be a great
idea, but it's also quite common.

> Aspect 1) is addressed to a good degree by the proposed split of anti-wrap
> into an age and anti-cancel triggers. And could be improved by reporting
> failsafe autovacuums in pg_stat_activity.

What you call aspect 2 (the issue with disastrous HW lock traffic jams
involving TRUNCATE being run from a cron job, etc) is a big goal of
mine for this patch series. You seem unsure of how effective my
approach (or an equally simple approach based on table age heuristics)
will be. Is that the case?

> Perhaps 2) could be improved a bit by emitting a WARNING message when we
> didn't cancel AV because it was anti-wraparound? But eventually I think we
> somehow need to signal the "intent" of the lock drop down into ProcSleep() or
> wherever it'd be.

That's doable, but definitely seems like separate work.

> I have two ideas around 3):
>
> First, we could introduce thresholds for the tuple thresholds, after which
> autovacuum isn't concealable anymore.

Do you think that's a good idea? I just don't trust those statistics,
at all. As in, I think they're often complete garbage.

> Second, we could track the number of cancellations since the last [auto]vacuum
> in pgstat, and only trigger the anti-cancel behaviour when autovacuum has been
> cancelled a number of times.

In theory that would be better than an approach along the lines I've
proposed, because it is directly based on a less aggressive approach
being tried a few times, and failing a few times. That part I like.

However, I also don't like several things about this approach. First
of all it relies on relatively complicated infrastructure, for
something that can be critical. Second, it will be hard to test.
Third, perhaps it would make sense to give the earlier/less aggressive
approach (a table age av that is still autocancellable) quite a large
number of attempts before giving up. If the table age isn't really
growing too fast, why not continue to be patient, possibly for quite a
long time?

Perhaps a hybrid strategy could be useful? Something like what I came
up with already, *plus* a mechanism that gives up after (say) 1000
cancellations, and escalates to no-auto-cancel, regardless of table
age. It seems sensible to assume that a less aggressive approach is
just hopeless relatively quickly (in wall clock time and logical XID
time) once we see sufficiently many cancellations against the same
table.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2023-01-14 04:14:46 Re: Amcheck verification of GiST and GIN
Previous Message Jose Arthur Benetasso Villanova 2023-01-14 03:34:38 Re: Amcheck verification of GiST and GIN