Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-01-13 21:27:10
Message-ID: CAH2-WznyQO7vq-wFh-SQkGumqi2DXHu40P7168rZUACgJjQh6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 13, 2022 at 12:19 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I can't follow this. If the idea is that we're going to
> opportunistically freeze a page whenever that allows us to mark it
> all-visible, then the remaining question is what XID age we should use
> to force freezing when that rule doesn't apply.

That is the idea, yes.

> It seems to me that
> there is a rebuttable presumption that that case ought to work just as
> it does today - and I think I hear you saying that it should NOT work
> as it does today, but should use some other threshold. Yet I can't
> understand why you think that.

Cases where we can not get a cleanup lock fall into 2 sharply distinct
categories in my mind:

1. Cases where our inability to get a cleanup lock signifies nothing
at all about the page in question, or any page in the same table, with
the same workload.

2. Pathological cases. Cases where we're at least at the mercy of the
application to do something about an idle cursor, where the situation
may be entirely hopeless on a long enough timeline. (Whether or not it
actually happens in the end is less significant.)

As far as I can tell, based on testing, category 1 cases are fixed by
the patch series: while a small number of pages from tables in
category 1 cannot be cleanup-locked during each VACUUM, even with the
patch series, it happens at random, with no discernable pattern. The
overall result is that our ability to advance relfrozenxid is really
not impacted *over time*. It's reasonable to suppose that lightning
will not strike in the same place twice -- and it would really have to
strike several times to invalidate this assumption. It's not
impossible, but the chances over time are infinitesimal -- and the
aggregate effect over time (not any one VACUUM operation) is what
matters.

There are seldom more than 5 or so of these pages, even on large
tables. What are the chances that some random not-yet-all-frozen block
(that we cannot freeze tuples on) will also have the oldest
couldn't-be-frozen XID, even once? And when it is the oldest, why
should it be the oldest by very many XIDs? And what are the chances
that the same page has the same problem, again and again, without that
being due to some pathological workload thing?

Admittedly you may see a blip from this -- you might notice that the
final relfrozenxid value for that one single VACUUM isn't quite as new
as you'd like. But then the next VACUUM should catch up with the
stable long term average again. It's hard to describe exactly why this
effect is robust, but as I said, empirically, in practice, it appears
to be robust. That might not be good enough as an explanation that
justifies committing the patch series, but that's what I see. And I
think I will be able to nail it down.

AFAICT that just leaves concern for cases in category 2. More on that below.

> Even if you eventually get there, it may take
> multiple days before you find a time when a table is immediately
> available, whereas if you had just gone over there and stood in line,
> you likely would have been seated in under an hour and savoring the
> goodness of quality deep-dish pizza not too long thereafter. The same
> principle applies here.

I think that you're focussing on individual VACUUM operations, whereas
I'm more concerned about the aggregate effect of a particular policy
over time.

Let's assume for a moment that the only thing that we really care
about is reliably keeping relfrozenxid reasonably recent. Even then,
waiting for a cleanup lock (to freeze some tuples) might be the wrong
thing to do. Waiting in line means that we're not freezing other
tuples (nobody else can either). So we're allowing ourselves to fall
behind on necessary, routine maintenance work that allows us to
advance relfrozenxid....in order to advance relfrozenxid.

> I do think that waiting for a cleanup lock when the age of the page is
> only vacuum_freeze_min_age seems like it might be too aggressive, but
> I don't think that's how it works. AFAICS, it's based on whether the
> vacuum is marked as aggressive, which has to do with
> vacuum_freeze_table_age, not vacuum_freeze_min_age. Let's turn the
> question around: if the age of the oldest XID on the page is >150
> million transactions and the buffer cleanup lock is not available now,
> what makes you think that it's any more likely to be available when
> the XID age reaches 200 million or 300 million or 700 million?

This is my concern -- what I've called category 2 cases have this
exact quality. So given that, why not freeze what you can, elsewhere,
on other pages that don't have the same issue (presumably the vast
vast majority in the table)? That way you have the best possible
chance of recovering once the DBA gets a clue and fixes the issue.

> There
> is perhaps an argument for some kind of tunable that eventually shoots
> the other session in the head (if we can identify it, anyway) but it
> seems to me that regardless of what threshold we pick, polling is
> strictly less likely to find a time when the page is available than
> waiting for the cleanup lock. It has the counterbalancing advantage of
> allowing the autovacuum worker to do other useful work in the meantime
> and that is indeed a significant upside, but at some point you're
> going to have to give up and admit that polling is a failed strategy,
> and it's unclear why 150 million XIDs - or probably even 50 million
> XIDs - isn't long enough to say that we're not getting the job done
> with half measures.

That's kind of what I meant. The difference between 50 million and 150
million is rather unclear indeed. So having accepted that that might
be true, why not be open to the possibility that it won't turn out to
be true in the long run, for any given table? With the enhancements
from the patch series in place (particularly the early freezing
stuff), what do we have to lose by making the FreezeLimit XID cutoff
for freezing much higher than your typical vacuum_freeze_min_age?
Maybe the same as autovacuum_freeze_max_age or vacuum_freeze_table_age
(it can't be higher than that without also making these other settings
become meaningless, of course).

Taking a wait-and-see approach like this (not being too quick to
decide that a table is in category 1 or category 2) doesn't seem to
make wraparound failure any more likely in any particular scenario,
but makes it less likely in other scenarios. It also gives us early
visibility into the problem, because we'll see that autovacuum can no
longer advance relfrozenxid (using the enhanced log output) where
that's generally expected.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2022-01-13 22:04:11 Re: do only critical work during single-user vacuum?
Previous Message Daniil Zakhlystov 2022-01-13 21:12:17 Re: libpq compression (part 2)