Re: Eager page freeze criteria clarification

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-08-28 20:30:15
Message-ID: CAAKRu_YzowY80dsktvykUCEJBE0Mco7SuBnvGED2_XyuC_3P=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 28, 2023 at 12:26 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> Then there's the question of whether it's the right metric. My first
> reaction is to think that it sounds pretty good. One thing I really
> like about it is that if the table is being vacuumed frequently, then
> we freeze less aggressively, and if the table is being vacuumed
> infrequently, then we freeze more aggressively. That seems like a very
> desirable property. It also seems broadly good that this metric
> doesn't really care about reads. If there are a lot of reads on the
> system, or no reads at all, it doesn't really change the chances that
> a certain page is going to be written again soon, and since reads
> don't change the insert LSN, here again it seems to do the right
> thing. I'm a little less clear about whether it's good that it doesn't
> really depend on wall-clock time. Certainly, that's desirable from the
> point of view of not wanting to have to measure wall-clock time in
> places where we otherwise wouldn't have to, which tends to end up
> being expensive. However, if I were making all of my freezing
> decisions manually, I might be more freeze-positive on a low-velocity
> system where writes are more stretched out across time than on a
> high-velocity system where we're blasting through the LSN space at a
> higher rate. But maybe that's not a very important consideration, and
> I don't know what we'd do about it anyway.

By low-velocity, do you mean lower overall TPS? In that case, wouldn't you be
less likely to run into xid wraparound and thus need less aggressive
opportunistic freezing?

> > Page Freezes/Page Frozen (less is better)
> >
> > | | Master | (1) | (2) | (3) | (4) | (5) |
> > |---+--------+---------+---------+---------+---------+---------|
> > | A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 |
> > | B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 |
> > | C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
> > | D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
> > | E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 |
> > | F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
> > | G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
> > | H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
> > | I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A |
>
> Hmm. I would say that the interesting rows here are A, D, and I, with
> rows C and E deserving honorable mention. In row A, master is bad.

So, this is where the caveat about absolute number of page freezes
matters. In algorithm A, master only did 57 page freezes (spread across
the various pgbench tables). At the end of the run, 2 pages were still
frozen.

> In row D, your algorithms are all bad, really bad. I don't quite
> understand how it can be that bad, actually.

So, I realize now that this test was poorly designed. I meant it to be a
worst case scenario, but I think one critical part was wrong. In this
example one client is going at full speed inserting a row and then
updating it. Then another rate-limited client is deleting old data
periodically to keep the table at a constant size. I meant to bulk load
the table with enough data that the delete job would have data to delete
from the start. With the default autovacuum settings, over the course of
45 minutes, I usually saw around 40 autovacuums of the table. Due to the
rate limiting, the first autovacuum of the table ends up freezing many
pages that are deleted soon after. Thus the total number of page freezes
is very high.

I will redo benchmarking of workload D and start the table with the
number of rows which the DELETE job seeks to maintain. My back of the
envelope math says that this will mean ratios closer to a dozen (and not
5000).

Also, I had doubled checkpoint timeout, which likely led master to
freeze so few pages (2 total freezes, neither of which were still frozen
at the end of the run). This is an example where master's overall low
number of page freezes makes it difficult to compare to the alternatives
using a ratio.

I didn't initially question the numbers because it seems like freezing
data and then deleting it right after would naturally be one of the
worst cases for opportunistic freezing, but certainly not this bad.

> Row I looks bad for algorithms 1, 2, and 4: they freeze pages because
> it looks cheap, but the work doesn't really pay off.

Yes, the work queue example looks like it is hard to handle.

> > % Frozen at end of run
> >
> > | | Master | (1) | (2) | (3) | (4) | (5) |
> > |---+--------+-----+-----+-----+------+-----+
> > | A | 0 | 1 | 99 | 0 | 81 | 0 |
> > | B | 71 | 96 | 99 | 3 | 98 | 2 |
> > | C | 0 | 9 | 100 | 6 | 92 | 5 |
> > | D | 0 | 1 | 1 | 1 | 1 | 1 |
> > | E | 0 | 63 | 100 | 68 | 100 | 67 |
> > | F | 0 | 5 | 14 | 6 | 14 | 5 |
> > | G | 0 | 100 | 100 | 92 | 100 | 67 |
> > | H | 0 | 11 | 100 | 9 | 86 | 5 |
> > | I | 0 | 100 | 100 | 0 | 100 | 0 |
>
> So all of the algorithms here, but especially 1, 2, and 4, freeze a
> lot more often than master.
>
> If I understand correctly, we'd like to see small numbers for B, D,
> and I, and large numbers for the other workloads. None of the
> algorithms seem to achieve that. (3) and (5) seem like they always
> behave as well or better than master, but they produce small numbers
> for A, C, F, and H. (1), (2), and (4) regress B and I relative to
> master but do better than (3) and (5) on A, C, and the latter two also
> on E.
>
> B is such an important benchmarking workload that I'd be loathe to
> regress it, so if I had to pick on the basis of this data, my vote
> would be (3) or (5), provided whatever is happening with (D) in the
> previous metric is not as bad as it looks. What's your reason for
> preferring (4) and (5) over (2) and (3)? I'm not clear that these
> numbers give us much of an idea whether 10% or 33% or something else
> is better in general.

(1) seems bad to me because it doesn't consider whether or not freezing
will be useful -- only if it will be cheap. It froze very little of the
cold data in a workload where a small percentage of it was being
modified (especially workloads A, C, H). And it froze a lot of data in
workloads where it was being uniformly modified (workload B).

I suggested (4) and (5) because I think the "older than 33%" threshold
is better than the "older than 10%" threshold. I chose both because I am
still unclear on our values. Are we willing to freeze more aggressively
at the expense of emitting more FPIs? As long as it doesn't affect
throughput? For pretty much all of these workloads, the algorithms which
froze based on page modification recency OR FPI required emitted many
more FPIs than those which froze based only on page modification
recency.

I've attached the WIP patch that I forgot in my previous email.

I'll rerun workload D in a more reasonable way and be back with results.

- Melanie

Attachment Content-Type Size
WIP-opp_freeze_cold_data.patch text/x-patch 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-08-28 20:52:15 Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
Previous Message Robert Haas 2023-08-28 20:17:22 Re: Eager page freeze criteria clarification