Re: Eager page freeze criteria clarification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-25 20:19:22
Message-ID: CA+TgmoZUEN-F9AiKMZjFsn3VtYd+KHpc5py7ozjG=rdqC7apDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Freeze tuples on a page opportunistically if the page would be totally
> frozen and:
>
> 4. Buffer is already dirty and no FPI is required OR page LSN is older
> than 33% of the LSNs since the last vacuum of the table.
>
> 5. Buffer is already dirty and no FPI is required AND page LSN is older
> than 33% of the LSNs since the last vacuum of the table.
>
> On master, the heuristic is to freeze a page opportunistically if it
> would be totally frozen and if pruning emitted an FPI.
> -------
>
> My takeaways from all of the workload results are as follows:
>
> Algorithm 4 is too aggressive and regresses performance compared to
> master.
>
> Algorithm 5 freezes surprisingly few pages, especially in workloads
> where only the most recent data is being accessed or modified
> (append-only, update recent data).
>
> A work queue-like workload with other concurrent workloads is a
> particular challenge for the freeze heuristic, and we should think more
> about how to handle this.

I feel like we have a sort of Goldilocks problem here: the porridge is
either too hot or too cold, never just right. Say we just look at
workload B, looking at the difference between pgbench_accounts (which
is randomly and frequently updated and thus shouldn't be
opportunistically frozen) and pgbench_history (which is append-only
and should thus be frozen aggressively). Algorithm 4 gets
pgbench_history right and pgbench_accounts wrong, and master does the
opposite. In a perfect world, we'd have an algorithm which could
distinguish sharply between those cases, ramping up to maximum
aggressiveness on pgbench_history while doing nothing at all to
pgbench_accounts.

Algorithm 5 partially accomplishes this, but the results aren't
super-inspiring either. It doesn't add many page freezes in the case
where freezing is bad, but it also only manages to freeze a quarter of
pgbench_history, where algorithm 4 manages to freeze basically all of
it. That's a pretty stark difference. Given that algorithm 5 seems to
make some mistakes on some of the other workloads, I don't think it's
entirely clear that it's an improvement over master, at least in
practical terms.

It might be worth thinking harder about what it takes specifically to
get the pgbench_history case, aka the append-only table case, correct.
One thing that probably doesn't work very well is to freeze pages that
are more than X minutes old. Algorithm 5 uses an LSN threshold instead
of a wall-clock based threshold, but the effect is the same. I think
the problem here is that the vacuum operation essentially happens in
an instant. At the instant that it happens, some fraction of the data
added since the last vacuum is older than whatever threshold you pick,
and the rest is newer. If data is added at a constant rate and you
want to freeze at least 90% of the data, your recency threshold has to
be no more than 10% of the time since the last vacuum. But with
autovacuum_naptimes=60s, that's like 6 seconds, and that's way too
aggressive for a table like pgbench_accounts. It seems to me that it's
not possible to get both cases right by twiddling the threshold,
because pgbench_history wants the threshold to be 0, and
pgbench_accounts wants it to be ... perhaps not infinity, because
maybe the distribution is Gaussian or Zipfian or something rather than
uniform, but probably a couple of minutes.

So I feel like if we want to get both pgbench_history and
pgbench_accounts right, we need to consider some additional piece of
information that makes those cases distinguishable. Either of those
tables can contain a page that hasn't been accessed in 20 seconds, but
the correct behavior for such a page differs between one case and the
other. One random idea that I had was to refuse to opportunistically
freeze a page more than once while it remains resident in
shared_buffers. The first time we do it, we set a bit in the buffer
header or something that suppresses further opportunistic freezing.
When the buffer is evicted the bit is cleared. So we can still be
wrong on a heavily updated table like pgbench_acccounts, but if the
table fits in shared_buffers, we'll soon realize that we're getting it
wrong a lot and will stop making the same mistake over and over. But
this kind of idea only works if the working set is small enough to fit
in shared_buffers, so I don't think it's actually a great plan, unless
we only care about suppressing excess freezing on workloads that fit
in shared_buffers.

A variant on the same theme could be to keep some table-level counters
and use them to assess how often we're getting it wrong. If we're
often thawing recently-frozen pages, don't freeze so aggressively. But
this will not work if different parts of the same table behave
differently.

If we don't want to do something like this that somehow responds to
the characteristics of a particular page or table, then it seems we
either have to freeze quite aggressively to pick up insert-only cases
and accept that this will lead to some wasted effort in heavy-update
cases, or else freeze less aggressively and accept that we're going
not going to freeze insert-only pages consistently.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2023-09-25 20:20:59 Re: SQL:2011 application time
Previous Message Joe Conway 2023-09-25 20:11:18 Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }