Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-02-20 15:30:21
Message-ID: CA+TgmoYu3utkEfkuY-wm9PRSCgANvxdUYhU8G3qzo7C+eD9Hrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 18, 2022 at 7:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> We have to worry about XIDs from MultiXacts (and xmax values more
> generally). And we have to worry about the case where we start out
> with only xmin frozen (by an earlier VACUUM), and then have to freeze
> xmax too. I believe that we have to generally consider xmin and xmax
> independently. For example, we cannot ignore xmax, just because we
> looked at xmin, since in general xmin alone might have already been
> frozen.

Right, so we at least need to add a similar comment to what I proposed
for MXIDs, and maybe other changes are needed, too.

> The difference between the cleanup lock path (in
> lazy_scan_prune/heap_prepare_freeze_tuple) and the share lock path (in
> lazy_scan_noprune/heap_tuple_needs_freeze) is what is at issue in both
> of these confusing comment blocks, really. Note that cutoff_xid is the
> name that both heap_prepare_freeze_tuple and heap_tuple_needs_freeze
> have for FreezeLimit (maybe we should rename every occurence of
> cutoff_xid in heapam.c to FreezeLimit).
>
> At a high level, we aren't changing the fundamental definition of an
> aggressive VACUUM in any of the patches -- we still need to advance
> relfrozenxid up to FreezeLimit in an aggressive VACUUM, just like on
> HEAD, today (we may be able to advance it *past* FreezeLimit, but
> that's just a bonus). But in a non-aggressive VACUUM, where there is
> still no strict requirement to advance relfrozenxid (by any amount),
> the code added by 0001 can set relfrozenxid to any known safe value,
> which could either be from before FreezeLimit, or after FreezeLimit --
> almost anything is possible (provided we respect the relfrozenxid
> invariant, and provided we see that we didn't skip any
> all-visible-not-all-frozen pages).
>
> Since we still need to "respect FreezeLimit" in an aggressive VACUUM,
> the aggressive case might need to wait for a full cleanup lock the
> hard way, having tried and failed to do it the easy way within
> lazy_scan_noprune (lazy_scan_noprune will still return false when any
> call to heap_tuple_needs_freeze for any tuple returns false) -- same
> as on HEAD, today.
>
> And so the difference at issue here is: FreezeLimit/cutoff_xid only
> needs to affect the new NewRelfrozenxid value we use for relfrozenxid in
> heap_prepare_freeze_tuple, which is involved in real freezing -- not
> in heap_tuple_needs_freeze, whose main purpose is still to help us
> avoid freezing where a cleanup lock isn't immediately available. While
> the purpose of FreezeLimit/cutoff_xid within heap_tuple_needs_freeze
> is to determine its bool return value, which will only be of interest
> to the aggressive case (which might have to get a cleanup lock and do
> it the hard way), not the non-aggressive case (where ratcheting back
> NewRelfrozenxid is generally possible, and generally leaves us with
> almost as good of a value).
>
> In other words: the calls to heap_tuple_needs_freeze made from
> lazy_scan_noprune are simply concerned with the page as it actually
> is, whereas the similar/corresponding calls to
> heap_prepare_freeze_tuple from lazy_scan_prune are concerned with
> *what the page will actually become*, after freezing finishes, and
> after lazy_scan_prune is done with the page entirely (ultimately
> the final NewRelfrozenxid value set in pg_class.relfrozenxid only has
> to be <= the oldest extant XID *at the time the VACUUM operation is
> just about to end*, not some earlier time, so "being versus becoming"
> is an interesting distinction for us).
>
> Maybe the way that FreezeLimit/cutoff_xid is overloaded can be fixed
> here, to make all of this less confusing. I only now fully realized
> how confusing all of this stuff is -- very.

Right. I think I understand all of this, or at least most of it -- but
not from the comment. The question is how the comment can be more
clear. My general suggestion is that function header comments should
have more to do with the behavior of the function than how it fits
into the bigger picture. If it's clear to the reader what conditions
must hold before calling the function and which must hold on return,
it helps a lot. IMHO, it's the job of the comments in the calling
function to clarify why we then choose to call that function at the
place and in the way that we do.

> As a general rule, we try to freeze all of the remaining live tuples
> on a page (following pruning) together, as a group, or none at all.
> Most of the time this is triggered by our noticing that the page is
> about to be set all-visible (but not all-frozen), and doing work
> sufficient to mark it fully all-frozen instead. Occasionally there is
> FreezeLimit to consider, which is now more of a backstop thing, used
> to make sure that we never get too far behind in terms of unfrozen
> XIDs. This is useful in part because it avoids any future
> non-aggressive VACUUM that is fundamentally unable to advance
> relfrozenxid (you can't skip all-visible pages if there are only
> all-frozen pages in the VM in practice).
>
> We're generally doing a lot more freezing with 0002, but we still
> manage to avoid freezing too much in tables like pgbench_tellers or
> pgbench_branches -- tables where it makes the least sense. Such tables
> will be updated so frequently that VACUUM is relatively unlikely to
> ever mark any page all-visible, avoiding the main criteria for
> freezing implicitly. It's also unlikely that they'll ever have an XID that is so
> old to trigger the fallback FreezeLimit-style criteria for freezing.
>
> In practice, freezing tuples like this is generally not that expensive in
> most tables where VACUUM freezes the majority of pages immediately
> (tables that aren't like pgbench_tellers or pgbench_branches), because
> they're generally big tables, where the overhead of FPIs tends
> to dominate anyway (gambling that we can avoid more FPIs later on is not a
> bad gamble, as gambles go). This seems to make the overhead
> acceptable, on balance. Granted, you might be able to poke holes in
> that argument, and reasonable people might disagree on what acceptable
> should mean. There are many value judgements here, which makes it
> complicated. (On the other hand we might be able to do better if there
> was a particularly bad case for the 0002 work, if one came to light.)

I think that the idea has potential, but I don't think that I
understand yet what the *exact* algorithm is. Maybe I need to read the
code, when I have some time for that. I can't form an intelligent
opinion at this stage about whether this is likely to be a net
positive.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-02-20 17:56:08 Re: Slow standby snapshot
Previous Message Dong Wook Lee 2022-02-20 15:20:00 Re: Print warning when I execute my own extension function