Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-01-17 21:27:48
Message-ID: CAH2-WzmQ667LNfQvOJKx3G6Cy644HLO8yjgR4OOrA9jStO6d7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 17, 2022 at 7:12 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 13, 2022 at 4:27 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > 1. Cases where our inability to get a cleanup lock signifies nothing
> > at all about the page in question, or any page in the same table, with
> > the same workload.
> >
> > 2. Pathological cases. Cases where we're at least at the mercy of the
> > application to do something about an idle cursor, where the situation
> > may be entirely hopeless on a long enough timeline. (Whether or not it
> > actually happens in the end is less significant.)
>
> Sure. I'm worrying about case (2). I agree that in case (1) waiting
> for the lock is almost always the wrong idea.

I don't doubt that we'd each have little difficulty determining
which category (1 or 2) a given real world case should be placed in,
using a variety of methods that put the issue in context (e.g.,
looking at the application code, talking to the developers or the
DBA). Of course, it doesn't follow that it would be easy to teach
vacuumlazy.c how to determine which category the same "can't get
cleanup lock" falls under, since (just for starters) there is no
practical way for VACUUM to see all that context.

That's what I'm effectively trying to work around with this "wait and
see approach" that demotes FreezeLimit to a backstop (and so justifies
removing the vacuum_freeze_min_age GUC that directly dictates our
FreezeLimit today). The cure may be worse than the disease, and the cure
isn't actually all that great at the best of times, so we should wait
until the disease visibly gets pretty bad before being
"interventionist" by waiting for a cleanup lock.

I've already said plenty about why I don't like vacuum_freeze_min_age
(or FreezeLimit) due to XIDs being fundamentally the wrong unit. But
that's not the only fundamental problem that I see. The other problem
is this: vacuum_freeze_min_age also dictates when an aggressive VACUUM
will start to wait for a cleanup lock. But why should the first thing
be the same as the second thing? I see absolutely no reason for it.
(Hence the idea of making FreezeLimit a backstop, and getting rid of
the GUC itself.)

> > This is my concern -- what I've called category 2 cases have this
> > exact quality. So given that, why not freeze what you can, elsewhere,
> > on other pages that don't have the same issue (presumably the vast
> > vast majority in the table)? That way you have the best possible
> > chance of recovering once the DBA gets a clue and fixes the issue.
>
> That's the part I'm not sure I believe.

To be clear, I think that I have yet to adequately demonstrate that
this is true. It's a bit tricky to do so -- absence of evidence isn't
evidence of absence. I think that your principled skepticism makes
sense right now.

Fortunately the early refactoring patches should be uncontroversial.
The controversial parts are all in the last patch in the patch series,
which isn't too much code. (Plus another patch to at least get rid of
vacuum_freeze_min_age, and maybe vacuum_freeze_table_age too, that
hasn't been written just yet.)

> Imagine a table with a
> gigantic number of pages that are not yet all-visible, a small number
> of all-visible pages, and one page containing very old XIDs on which a
> cursor holds a pin. I don't think it's obvious that not waiting is
> best. Maybe you're going to end up vacuuming the table repeatedly and
> doing nothing useful. If you avoid vacuuming it repeatedly, you still
> have a lot of work to do once the DBA locates a clue.

Maybe this is a simpler way of putting it: I want to delay waiting on
a pin until it's pretty clear that we truly have a pathological case,
which should in practice be limited to an anti-wraparound VACUUM,
which will now be naturally rare -- most individual tables will
literally never have even one anti-wraparound VACUUM.

We don't need to reason about the vacuuming schedule this way, since
anti-wraparound VACUUMs are driven by age(relfrozenxid) -- we don't
really have to predict anything. Maybe we'll need to do an
anti-wraparound VACUUM immediately after a non-aggressive autovacuum
runs, without getting a cleanup lock (due to an idle cursor
pathological case). We won't be able to advance relfrozenxid until the
anti-wraparound VACUUM runs (at the earliest) in this scenario, but it
makes no difference. Rather than predicting the future, we're covering
every possible outcome (at least to the extent that that's possible).

> I think there's probably an important principle buried in here: the
> XID threshold that forces a vacuum had better also force waiting for
> pins. If it doesn't, you can tight-loop on that table without getting
> anything done.

I absolutely agree -- that's why I think that we still need
FreezeLimit. Just as a backstop, that in practice very rarely
influences our behavior. Probably just in those remaining cases that
are never vacuumed except for the occasional anti-wraparound VACUUM
(even then it might not be very important).

> We should probably distinguish between the situation where (a) an
> adverse pin is held continuously and effectively forever and (b)
> adverse pins are held frequently but for short periods of time.

I agree. It's just hard to do that from vacuumlazy.c, during a routine
non-aggressive VACUUM operation.

> I think it's possible to imagine a small, very hot table (or portion of
> a table) where very high concurrency means there are often pins. In
> case (a), it's not obvious that waiting will ever resolve anything,
> although it might prevent other problems like infinite looping. In
> case (b), a brief wait will do a lot of good. But maybe that doesn't
> even matter. I think part of your argument is that if we fail to
> update relfrozenxid for a while, that really isn't that bad.

Yeah, that is a part of it -- it doesn't matter (until it really
matters), and we should be careful to avoid making the situation worse
by waiting for a cleanup lock unnecessarily. That's actually a very
drastic thing to do, at least in a world where freezing has been
decoupled from advancing relfrozenxid.

Updating relfrozenxid should now be thought of as a continuous thing,
not a discrete thing. And so it's highly unlikely that any given
VACUUM will ever *completely* fail to advance relfrozenxid -- that
fact alone signals a pathological case (things that are supposed to be
continuous should not ever appear to be discrete). But you need multiple
VACUUMs to see this "signal". It is only revealed over time.

It seems wise to make the most modest possible assumptions about
what's going on here. We might well "get lucky" before the next VACUUM
comes around when we encounter what at first appears to be a
problematic case involving an idle cursor -- for all kinds of reasons.
Like maybe an opportunistic prune gets rid of the old XID for us,
without any freezing, during some brief window where the application
doesn't have a cursor. We're only talking about one or two heap pages
here.

We might also *not* "get lucky" with the application and its use of
idle cursors, of course. But in that case we must have been doomed all
along. And we'll at least have put things on a much better footing in
this disaster scenario -- there is relatively little freezing left to
do in single user mode, and relfrozenxid should already be the same as
the exact oldest XID in that one page.

> I think I agree, up to a point. One consequence of failing to
> immediately advance relfrozenxid might be that pg_clog and friends are
> bigger, but that's pretty minor.

My arguments are probabilistic (sort of), which makes it tricky.
Actual test cases/benchmarks should bear out the claims that I've
made. If anything fully convinces you, it'll be that, I think.

> Another consequence might be that we
> might vacuum the table more times, which is more serious. I'm not
> really sure that can happen to a degree that is meaningful, apart from
> the infinite loop case already described, but I'm also not entirely
> sure that it can't.

It's definitely true that this overall strategy could result in there
being more individual VACUUM operations. But that naturally
follow from teaching VACUUM to avoid waiting indefinitely.

Obviously the important question is whether we'll do
meaningfully more work for less benefit (in Postgres 15, relative to
Postgres 14). Your concern is very reasonable. I just can't imagine
how we could lose out to any notable degree. Which is a start.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-01-17 21:34:16 Re: Add last commit LSN to pg_last_committed_xact()
Previous Message Alvaro Herrera 2022-01-17 21:22:19 Re: a misbehavior of partition row movement (?)