Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-27 18:40:10
Message-ID: CAH2-WznKEu+phDt1PN4MicahH9d+PC6ckf21knCyzG-hs=RAiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 27, 2023 at 12:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I agree with bringing high-level context into the decision about whether to
> freeze agressively - my problem with the eager freezing strategy patch isn't
> that it did that too much, it's that it didn't do it enough.
>
>
> But I also don't think what I describe above is really comparable to "table
> level" eager freezing though - the potential worst case overhead is a small
> fraction of the WAL volume, and there's zero increase in data write volume.

All I meant was that I initially thought that you were trying to
replace the FPI thing with something at the same level of ambition,
that could work in a low context way. But I now see that you're
actually talking about something quite a bit more ambitious for
Postgres 16, which is structurally similar to a freezing strategy,
from a code point of view -- it relies on high-level context for the
VACUUM/table as a whole. I wasn't equating it with the eager freezing
strategy in any other way.

It might also be true that this other thing happens to render the FPI
mechanism redundant. I'm actually not completely sure that it will
just yet. Let me verify my understanding of your proposal:

You mean that we'd take the page LSN before doing anything with the
page, right at the top of lazy_scan_prune, at the same point that
"fpi_before" is initialized currently. Then, if we subsequently
dirtied the page (as determined by its LSN, so as to focus on "dirtied
via WAL logged operation") during pruning, *and* if the "lsn_before"
of the page was from before our cutoff (derived via " lsn_threshold =
insert_lsn - (insert_lsn - lsn_of_last_vacuum) * 0.1" or similar),
*and* if the page is eligible to become all-frozen, then we'd freeze
the page.

That's it, right? It's about pages that *we* (VACUUM) dirtied, and
wrote records and/or FPIs for already?

> I suspect the absolute worst case of "always freeze dirty pages" is when a
> single tuple on the page gets updated immediately after every time we freeze
> the page - a single tuple is where the freeze record is the least space
> efficient. The smallest update is about the same size as the smallest freeze
> record. For that to amount to a large WAL increase you'd a crazy rate of such
> updates interspersed with vacuums. In slightly more realistic cases (i.e. not
> column less tuples that constantly get updated and freezing happening all the
> time) you end up with a reasonably small WAL rate overhead.

Other thing is that we'd be doing this in situations where we already
know that a VISIBLE record is required, which is comparable in size to
a FREEZE_PAGE record with one tuple/plan (around 64 bytes). The
smallest WAL records are mostly just generic WAL record header
overhead.

> Obviously that's a pointless workload, but I do think that
> analyzing the "outer boundaries" of the regression something can cause, can be
> helpful.

I agree about the "outer boundaries" being a useful guide.

> I think one way forward with the eager strategy approach would be to have a
> very narrow gating condition for now, and then incrementally expand it in
> later releases.
>
> One use-case where the eager strategy is particularly useful is
> [nearly-]append-only tables - and it's also the one workload that's reasonably
> easy to detect using stats. Maybe something like
> (dead_tuples_since_last_vacuum / inserts_since_last_vacuum) < 0.05
> or so.
>
> That'll definitely leave out loads of workloads where eager freezing would be
> useful - but are there semi-reasonable workloads where it'll hurt badly? I
> don't *think* so.

I have no further plans to work on eager freezing strategy, or
anything of the sort, in light of recent developments. My goal at this
point is very unambitious: to get the basic page-level freezing work
into a form that makes sense as a standalone thing for Postgres 16. To
put things on a good footing, so that I can permanently bow out of all
work on VACUUM having left everything in good order. That's all.

Now, that might still mean that I'd facilitate future work of this
sort, by getting the right basic structure in place. But my
involvement in any work on freezing or anything of the sort ends here,
both as a patch author and a committer of anybody else's work. I'm
proud of the work I've done on VACUUM, but I'm keen to move on from
it.

> > What about unlogged/temporary tables? The obvious thing to do there is
> > what I did in the patch that was reverted (freeze whenever the page
> > will thereby become all-frozen), and forget about LSNs. But you have
> > already objected to that part, specifically.
>
> My main concern about that is the data write amplification it could cause when
> page is clean when we start freezing. But I can't see a large potential
> downside to always freezing unlogged/temp tables when the page is already
> dirty.

But we have to dirty the page anyway, just to set PD_ALL_VISIBLE. That
was always a gating condition. Actually, that may have depended on not
having SKIP_PAGES_THRESHOLD, which the vm snapshot infrastructure
would have removed. That's not happening now, so I may need to
reassess. But even with SKIP_PAGES_THRESHOLD, it should be fine.

> > BTW, you still haven't changed the fact that you get rather different
> > behavior with checksums/wal_log_hints. I think that that's good, but
> > you didn't seem to.
>
> I think that, if we had something like the recency test I was talking about,
> we could afford to alway freeze when the page is already dirty and not very
> recently modified. I.e. not even insist on a WAL record having been generated
> during pruning/HTSV. But I need to think through the dangers of that more.

Now I'm confused. I thought that the recency test you talked about was
purely to be used to do something a bit like the FPI thing, but using
some high level context. Now I don't know what to think.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Darafei Komяpa Praliaskouski 2023-01-27 19:09:14 Re: Optimizing PostgreSQL with LLVM's PGO+LTO
Previous Message Andres Freund 2023-01-27 18:36:22 Re: New strategies for freezing, advancing relfrozenxid early