Re: Eager page freeze criteria clarification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-28 04:03:11
Message-ID: CAH2-WzmY_ywKHgVQ-0a7MVwq8MAmzztsDsSjgdns7OtMmbFhhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2023 at 6:35 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > if insert LSN - RedoRecPtr < insert LSN - page LSN
> > page is older than the most recent checkpoint start, so freeze it
> > regardless of whether or not it would emit an FPI
> >
> > What aggressiveness levels should there be? What should change at each
> > level? What criteria should pages have to meet to be subject to the
> > aggressiveness level?
>
> I'm thinking something very roughly along these lines could make sense:
>
> page_lsn_age = insert_lsn - page_lsn;

While there is no reason to not experiment here, I have my doubts
about what you've sketched out. Most importantly, it doesn't have
anything to say about the cost of not freezing -- just the cost of
freezing. But isn't the main problem *not* freezing when we could and
should have? (Of course the cost of freezing is very relevant, but
it's still secondary.)

But even leaving that aside, I just don't get why this will work with
the case that you yourself emphasized earlier on: a workload with
inserts plus "hot tail" updates. If you run TPC-C according to spec,
there is about 12 or 14 hours between the initial inserts into the
orders and order lines table (by a new order transaction), and the
subsequent updates (from the delivery transaction). When I run the
benchmark, I usually don't stick with the spec (it's rather limiting
on modern hardware), so it's more like 2 - 4 hours before each new
order is delivered (meaning updated in those two big tables). Either
way, it's a fairly long time relative to everything else.

Won't the algorithm that you've sketched always think that
"unfreezing" pages doesn't affect recently frozen pages with such a
workload? Isn't the definition of "recently frozen" that emerges from
this algorithm not in any way related to the order delivery time, or
anything like that? You know, rather like vacuum_freeze_min_age.

Separately, at one point you also said "Yes. If the ratio of
opportunistically frozen pages (which I'd define as pages that were
frozen not because they strictly needed to) vs the number of unfrozen
pages increases, we need to make opportunistic freezing less
aggressive and vice versa".

Can we expect a discount for freezing that happened to be very cheap
anyway, when that doesn't work out?

What about a page that we would have had to have frozen anyway (based
on the conventional vacuum_freeze_min_age criteria) not too long after
it was frozen by this new mechanism, that nevertheless became unfrozen
some time later? That is, a page where "the unfreezing" cannot
reasonably be blamed on the initial so-called opportunistic freezing,
because really it was a total accident involving when VACUUM showed
up? You know, just like we'd expect with the TPC-C tables.

Aside: "unfrozen pages" seems to refer to pages that were frozen, and
became unfrozen. Not pages that are simply frozen. Lots of
opportunities for confusion here.

I'm not saying that it's wrong to freeze like this in the specific case of
TPC-C. But do you really need to invent all this complicated
infrastructure, just to avoid freezing the same pages again in a tight
loop?

On a positive note, I like that what you've laid out freezes eagerly
when an FPI won't result -- this much we can all agree on. I guess
that that part is becoming uncontroversial.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-09-28 04:04:16 Re: Requiring recovery.signal or standby.signal when recovering with a backup_label
Previous Message Kyotaro Horiguchi 2023-09-28 03:58:51 Re: Requiring recovery.signal or standby.signal when recovering with a backup_label