Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-01-07 20:24:20
Message-ID: CA+TgmoZKZCCOuKs00uPjqCUXiWbtAJa_v=UJqFSQNvAST0N8Yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 6, 2022 at 5:46 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> One obvious reason for this is that the opportunistic freezing stuff
> is expected to be the thing that usually forces freezing -- not
> vacuum_freeze_min_age, nor FreezeLimit, nor any other XID-based
> cutoff. As you more or less pointed out yourself, we still need
> FreezeLimit as a backstop mechanism. But the value of FreezeLimit can
> just come from autovacuum_freeze_max_age/2 in all cases (no separate
> GUC), or something along those lines. We don't particularly expect the
> value of FreezeLimit to matter, at least most of the time. It should
> only noticeably affect our behavior during anti-wraparound VACUUMs,
> which become rare with the patch (e.g. my pgbench_accounts example
> upthread). Most individual tables will never get even one
> anti-wraparound VACUUM -- it just doesn't ever come for most tables in
> practice.

This seems like a weak argument. Sure, you COULD hard-code the limit
to be autovacuum_freeze_max_age/2 rather than making it a separate
tunable, but I don't think it's better. I am generally very skeptical
about the idea of using the same GUC value for multiple purposes,
because it often turns out that the optimal value for one purpose is
different than the optimal value for some other purpose. For example,
the optimal amount of memory for a hash table is likely different than
the optimal amount for a sort, which is why we now have
hash_mem_multiplier. When it's not even the same value that's being
used in both places, but the original value in one place and a value
derived from some formula in the other, the chances of things working
out are even less.

I feel generally that a lot of the argument you're making here
supposes that tables are going to get vacuumed regularly. I agree that
IF tables are being vacuumed on a regular basis, and if as part of
that we always push relfrozenxid forward as far as we can, we will
rarely have a situation where aggressive strategies to avoid
wraparound are required. However, I disagree strongly with the idea
that we can assume that tables will get vacuumed regularly. That can
fail to happen for all sorts of reasons. One of the common ones is a
poor choice of autovacuum configuration. The most common problem in my
experience is a cost limit that is too low to permit the amount of
vacuuming that is actually required, but other kinds of problems like
not enough workers (so tables get starved), too many workers (so the
cost limit is being shared between many processes), autovacuum=off
either globally or on one table (because of ... reasons),
autovacuum_vacuum_insert_threshold = -1 plus not many updates (so
thing ever triggers the vacuum), autovacuum_naptime=1d (actually seen
in the real world! ... and, no, it didn't work well), or stats
collector problems are all possible. We can *hope* that there are
going to be regular vacuums of the table long before wraparound
becomes a danger, but realistically, we better not assume that in our
choice of algorithms, because the real world is a messy place where
all sorts of crazy things happen.

Now, I agree with you in part: I don't think it's obvious that it's
useful to tune vacuum_freeze_table_age. When I advise customers on how
to fix vacuum problems, I am usually telling them to increase
autovacuum_vacuum_cost_limit, possibly also with an increase in
autovacuum_workers; or to increase or decrease
autovacuum_freeze_max_age depending on which problem they have; or
occasionally to adjust settings like autovacuum_naptime. It doesn't
often seem to be necessary to change vacuum_freeze_table_age or, for
that matter, vacuum_freeze_min_age. But if we remove them and then
discover scenarios where tuning them would have been useful, we'll
have no options for fixing PostgreSQL systems in the field. Waiting
for the next major release in such a scenario, or even the next minor
release, is not good. We should be VERY conservative about removing
existing settings if there's any chance that somebody could use them
to tune their way out of trouble.

> My big issue with vacuum_freeze_min_age is that it doesn't really work
> with the freeze map work in 9.6, which creates problems that I'm
> trying to address by freezing early and so on. After all, HEAD (and
> all stable branches) can easily set a page to all-visible (but not
> all-frozen) in the VM, meaning that the page's tuples won't be
> considered for freezing until the next aggressive VACUUM. This means
> that vacuum_freeze_min_age is already frequently ignored by the
> implementation -- it's conditioned on other things that are practically
> impossible to predict.
>
> Curious about your thoughts on this existing issue with
> vacuum_freeze_min_age. I am concerned about the "freezing cliff" that
> it creates.

So, let's see: if we see a page where the tuples are all-visible and
we seize the opportunity to freeze it, we can spare ourselves the need
to ever visit that page again (unless it gets modified). But if we
only mark it all-visible and leave the freezing for later, the next
aggressive vacuum will have to scan and dirty the page. I'm prepared
to believe that it's worth the cost of freezing the page in that
scenario. We've already dirtied the page and written some WAL and
maybe generated an FPW, so doing the rest of the work now rather than
saving it until later seems likely to be a win. I think it's OK to
behave, in this situation, as if vacuum_freeze_min_age=0.

There's another situation in which vacuum_freeze_min_age could apply,
though: suppose the page isn't all-visible yet. I'd argue that in that
case we don't want to run around freezing stuff unless it's quite old
- like older than vacuum_freeze_table_age, say. Because we know we're
going to have to revisit this page in the next vacuum anyway, and
expending effort to freeze tuples that may be about to be modified
again doesn't seem prudent. So, hmm, on further reflection, maybe it's
OK to remove vacuum_freeze_min_age. But if we do, then I think we had
better carefully distinguish between the case where the page can
thereby be marked all-frozen and the case where it cannot. I guess you
say the same, further down.

> > So it's natural to decide whether or not
> > we're going to wait for cleanup locks on pages on the basis of how old
> > the XIDs they contain actually are.
>
> I agree, but again, it's only a backstop. With the patch we'd have to
> be rather unlucky to ever need to wait like this.
>
> What are the chances that we keep failing to freeze an old XID from
> one particular page, again and again? My testing indicates that it's a
> negligible concern in practice (barring pathological cases with idle
> cursors, etc).

I mean, those kinds of pathological cases happen *all the time*. Sure,
there are plenty of users who don't leave cursors open. But the ones
who do don't leave them around for short periods of time on randomly
selected pages of the table. They are disproportionately likely to
leave them on the same table pages over and over, just like data can't
in general be assumed to be uniformly accessed. And not uncommonly,
they leave them around until the snow melts.

And we need to worry about those kinds of users, actually much more
than we need to worry about users doing normal things. Honestly,
autovacuum on a system where things are mostly "normal" - no
long-running transactions, adequate resources for autovacuum to do its
job, reasonable configuration settings - isn't that bad. It's true
that there are people who get surprised by an aggressive autovacuum
kicking off unexpectedly, but it's usually the first one during the
cluster lifetime (which is typically the biggest, since the initial
load tends to be bigger than later ones) and it's usually annoying but
survivable. The places where autovacuum becomes incredibly frustrating
are the pathological cases. When insufficient resources are available
to complete the work in a timely fashion, or difficult trade-offs have
to be made, autovacuum is too dumb to make the right choices. And even
if you call your favorite PostgreSQL support provider and they provide
an expert, once it gets behind, autovacuum isn't very tractable: it
will insist on vacuuming everything, right now, in an order that it
chooses, and it's not going to listen to take any nonsense from some
human being who thinks they might have some useful advice to provide!

> But the "freeze early" heuristics work a bit like that anyway. We
> won't freeze all the tuples on a whole heap page early if we won't
> otherwise set the heap page to all-visible (not all-frozen) in the VM
> anyway.

Hmm, I didn't realize that we had that. Is that an existing thing or
something new you're proposing to do? If existing, where is it?

> > IOW, the time that it takes to freeze that one tuple *in theory* might
> > be small. But in practice it may be very large, because we won't
> > necessarily get around to it on any meaningful time frame.
>
> On second thought I agree that my specific example of 1.5 billion XIDs
> was a little too optimistic of me. But 50 million XIDs (i.e. the
> vacuum_freeze_min_age default) is too pessimistic. The important point
> is that FreezeLimit could plausibly become nothing more than a
> backstop mechanism, with the design from the patch series -- something
> that typically has no effect on what tuples actually get frozen.

I agree that it's OK for this to become a purely backstop mechanism
... but again, I think that the design of such backstop mechanisms
should be done as carefully as we know how, because users seem to hit
the backstop all the time. We want it to be made of, you know, nylon
twine, rather than, say, sharp nails. :-)

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-01-07 20:26:31 Re: make MaxBackends available in _PG_init
Previous Message Andres Freund 2022-01-07 20:22:36 Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers