Re: Eager page freeze criteria clarification

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-27 23:09:41
Message-ID: CAAKRu_Y0nLmQ=YS1c2ORzLi7bu3eWjdx+32BuFc0Tho2o7E3rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2023 at 3:25 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Sep 27, 2023 at 12:34 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > One way to deal with that would be to not track the average age in
> > LSN-difference-bytes, but convert the value to some age metric at that
> > time. If we e.g. were to convert the byte-age into an approximate age in
> > checkpoints, with quadratic bucketing (e.g. 0 -> current checkpoint, 1 -> 1
> > checkpoint, 2 -> 2 checkpoints ago, 3 -> 4 checkpoints ago, ...), using a mean
> > of that age would probably be fine.
>
> Yes. I think it's possible that we could even get by with just two
> buckets. Say current checkpoint and not. Or current-or-previous
> checkpoint and not. And just look at what percentage of accesses fall
> into this first bucket -- it should be small or we're doing it wrong.
> It seems like the only thing we actually need to avoid is freezing the
> same ages over and over again in a tight loop.

At the risk of seeming too execution-focused, I want to try and get more
specific. Here is a description of an example implementation to test my
understanding:

In table-level stats, save two numbers: younger_than_cpt/older_than_cpt
storing the number of instances of unfreezing a page which is either
younger or older than the start of the most recent checkpoint at the
time of its unfreezing

Upon update or delete (and insert?), if the page being modified is
frozen and
if insert LSN - RedoRecPtr > insert LSN - old page LSN
page is younger, younger_than_cpt += 1

otherwise, older_than_cpt += 1

The ratio of younger/total and older/total can be used to determine how
aggressive opportunistic freezing will be.

This has the downside of counting most unfreezings directly after a
checkpoint in the older_than_cpt bucket. That is: older_than_cpt !=
longer_frozen_duration at certain times in the checkpoint cycle.

Now, I'm trying to imagine how this would interact in a meaningful way
with opportunistic freezing behavior during vacuum.

You would likely want to combine it with one of the other heuristics we
discussed.

For example:
For a table with only 20% younger unfreezings, when vacuuming that page,

if insert LSN - RedoRecPtr < insert LSN - page LSN
page is older than the most recent checkpoint start, so freeze it
regardless of whether or not it would emit an FPI

What aggressiveness levels should there be? What should change at each
level? What criteria should pages have to meet to be subject to the
aggressiveness level?

I have some ideas, but I'd like to try an algorithm along these lines
with an updated work queue workload and the insert-only workload. And I
want to make sure I understand the proposal first.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-09-27 23:38:51 Re: Eager page freeze criteria clarification
Previous Message Michael Paquier 2023-09-27 22:53:45 Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()