Re: Eager page freeze criteria clarification

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-27 19:25:23
Message-ID: CA+Tgmoab6BVyxK+a0EwyCyj3ndb9gnObr1c3dXG7=kYo2EEnMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2023 at 12:34 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> What do you mean with "always freeze aggressively" - do you mean 'aggressive'
> autovacuums? Or opportunistic freezing being aggressive? I don't know why the
> former would be the case?

I meant the latter.

> > When it grows large enough, we suddenly stop freezing it aggressively, and
> > now it starts experiencing vacuums that do a whole bunch of work all at
> > once. A user who notices that is likely to be pretty confused about what
> > happened, and maybe not too happy when they find out.
>
> Hm - isn't my proposal exactly the other way round? I'm proposing that a page
> is frozen more aggressively if not already in shared buffers - which will
> become more common once the table has grown "large enough"?

OK, but it's counterintuitive either way, IMHO.

> I think there were three main ideas that we discussed:
>
> 1) We don't need to be accurate in the freezing decisions for individual
> pages, we "just" need to avoid the situation that over time we commonly
> freeze pages that will be updated again "soon".
> 2) It might be valuable to adjust the "should freeze page opportunistically"
> based on feedback.
> 3) We might need to classify the workload for a table and use different
> heruristics for different workloads.

I agree with all of that. Good summary.

> One way to deal with that would be to not track the average age in
> LSN-difference-bytes, but convert the value to some age metric at that
> time. If we e.g. were to convert the byte-age into an approximate age in
> checkpoints, with quadratic bucketing (e.g. 0 -> current checkpoint, 1 -> 1
> checkpoint, 2 -> 2 checkpoints ago, 3 -> 4 checkpoints ago, ...), using a mean
> of that age would probably be fine.

Yes. I think it's possible that we could even get by with just two
buckets. Say current checkpoint and not. Or current-or-previous
checkpoint and not. And just look at what percentage of accesses fall
into this first bucket -- it should be small or we're doing it wrong.
It seems like the only thing we actually need to avoid is freezing the
same ages over and over again in a tight loop.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-09-27 19:33:24 Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)
Previous Message Tom Lane 2023-09-27 19:05:49 Re: Annoying build warnings from latest Apple toolchain