Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-26 23:36:52
Message-ID: CAH2-Wz=eCBQCD16pG2SxiG1GzGh53T9gOKVj_Hdaj-DJjQQMew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2023 at 12:45 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Most of the overhead of FREEZE WAL records (with freeze plan
> > deduplication and page-level freezing in) is generic WAL record header
> > overhead. Your recent adversarial test case is going to choke on that,
> > too. At least if you set checkpoint_timeout to 1 minute again.
>
> I don't quite follow. What do you mean with "record header overhead"? Unless
> that includes FPIs, I don't think that's that commonly true?

Even if there are no directly observable FPIs, there is still extra
WAL, which can cause FPIs indirectly, just by making checkpoints more
frequent. I feel ridiculous even having to explain this to you.

> The problematic case I am talking about is when we do *not* emit a WAL record
> during pruning (because there's nothing to prune), but want to freeze the
> table. If you don't log an FPI, the remaining big overhead is that increasing
> the LSN on the page will often cause an XLogFlush() when writing out the
> buffer.
>
> I don't see what your reference to checkpoint timeout is about here?
>
> Also, as I mentioned before, the problem isn't specific to checkpoint_timeout
> = 1min. It just makes it cheaper to reproduce.

That's flagrantly intellectually dishonest. Sure, it made it easier to
reproduce. But that's not all it did!

You had *lots* of specific numbers and technical details in your first
email, such as "Time for vacuuming goes up to ~5x. WAL volume to
~9x.". But you did not feel that it was worth bothering with details
like having set checkpoint_timeout to 1 minute, which is a setting
that nobody uses, and obviously had a multiplicative effect. That
detail was unimportant. I had to drag it out of you!

You basically found a way to add WAL overhead to a system/workload
that is already in a write amplification vicious cycle, with latent
tipping point type behavior.

There is a practical point here, that is equally obvious, and yet
somehow still needs to be said: benchmarks like that one are basically
completely free of useful information. If we can't agree on how to
assess such things in general, then what can we agree on when it comes
to what should be done about it, what trade-off to make, when it comes
to any similar question?

> > In many cases we'll have to dirty the page anyway, just to set
> > PD_ALL_VISIBLE. The whole way the logic works is conditioned (whether
> > triggered by an FPI or triggered by my now-reverted GUC) on being able
> > to set the whole page all-frozen in the VM.
>
> IIRC setting PD_ALL_VISIBLE doesn't trigger an FPI unless we need to log hint
> bits. But freezing does trigger one even without wal_log_hint_bits.

That is correct.

> You're right, it makes sense to consider whether we'll emit a
> XLOG_HEAP2_VISIBLE anyway.

As written the page-level freezing FPI mechanism probably doesn't
really stand to benefit much from doing that. Either checksums are
disabled and it's just a hint, or they're enabled and there is a very
high chance that we'll get an FPI inside lazy_scan_prune rather than
right after it is called, when PD_ALL_VISIBLE is set.

That's not perfect, of course, but it doesn't have to be. Perhaps it
should still be improved, just on general principle.

> > > A less aggressive version would be to check if any WAL records were emitted
> > > during heap_page_prune() (instead of FPIs) and whether we'd emit an FPI if we
> > > modified the page again. Similar to what we do now, except not requiring an
> > > FPI to have been emitted.
> >
> > Also way more aggressive. Not nearly enough on its own.
>
> In which cases will it be problematically more aggressive?
>
> If we emitted a WAL record during pruning we've already set the LSN of the
> page to a very recent LSN. We know the page is dirty. Thus we'll already
> trigger an XLogFlush() during ringbuffer replacement. We won't emit an FPI.

You seem to be talking about this as if the only thing that could
matter is the immediate FPI -- the first order effects -- and not any
second order effects. You certainly didn't get to 9x extra WAL
overhead by controlling for that before. Should I take it that you've
decided to assess these things more sensibly now? Out of curiosity:
why the change of heart?

> > > But to me it seems a bit odd that VACUUM now is more aggressive if checksums /
> > > wal_log_hint bits is on, than without them. Which I think is how using either
> > > of pgWalUsage.wal_fpi, pgWalUsage.wal_records ends up working?
> >
> > Which part is the odd part? Is it odd that page-level freezing works
> > that way, or is it odd that page-level checksums work that way?
>
> That page-level freezing works that way.

I think that it will probably cause a little confusion, and should be
specifically documented. But other than that, it seems reasonable
enough to me. I mean, should I not do something that's going to be of
significant help to users with checksums, just because it'll be
somewhat confusing to a small minority of them?

> > In any case this seems like an odd thing for you to say, having
> > eviscerated a patch that really just made the same behavior trigger
> > independently of FPIs in some tables, controlled via a GUC.
>
> jdksjfkjdlkajsd;lfkjasd;lkfj;alskdfj
>
> That behaviour I critizied was causing a torrent of FPIs and additional
> dirtying of pages. My proposed replacement for the current FPI check doesn't,
> because a) it only triggers when we wrote a WAL record b) It doesn't trigger
> if we would write an FPI.

It increases the WAL written in many important cases that
vacuum_freeze_strategy_threshold avoided. Sure, it did have some
problems, but the general idea of adding some high level
context/strategies seems essential to me.

You also seem to be suggesting that your proposed change to how basic
page-level freezing works will make freezing of pages on databases
with page-level checksums similar to an equivalent case without
checksums enabled. Even assuming that that's an important goal, you
won't be much closer to achieving it under your scheme, since hint
bits being set during VACUUM and requiring an FPI still make a huge
difference. Tables like pgbench_history have pages that generally
aren't pruned, that don't need to log an FPI just to set
PD_ALL_VISIBLE once checksums are disabled.

That's the difference that users are going to notice between checksums
enabled vs disabled, if they notice any -- it's the most important one
by far.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-01-26 23:47:13 Re: Rework of collation code, extensibility
Previous Message Jeff Davis 2023-01-26 23:29:30 Re: GUCs to control abbreviated sort keys