Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-27 07:11:41
Message-ID: CAH2-Wz=uonC1dOVrBJqtPzzVQ0Cet2uBvzHSQphS4zA_0QCVfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2023 at 9:58 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> It doesn't seem like a great proxy to me. ISTM that this means that how
> aggressive vacuum is about opportunistically freezing pages depends on config
> variables like checkpoint_timeout & max_wal_size (less common opportunistic
> freezing), full_page_writes & use of unlogged tables (no opportunistic
> freezing), and the largely random scheduling of autovac workers.

The FPI thing was originally supposed to complement the freezing
strategies stuff, and possibly other rules that live in
lazy_scan_prune. Obviously you can freeze a page by following any rule
that you care to invent -- you can decide by calling random(). Two
rules can coexist during the same VACUUM (actually, they do already).

> Essentially the "any fpi" logic is a very coarse grained way of using the page
> LSN as a measurement. As I said, I don't think "has a checkpoint occurred
> since the last write" is a good metric to avoid unnecessary freezing - it's
> too coarse. But I think using the LSN is the right thought. What about
> something like
>
> lsn_threshold = insert_lsn - (insert_lsn - lsn_of_last_vacuum) * 0.1
> if (/* other conds */ && PageGetLSN(page) <= lsn_threshold)
> FreezeMe();
>
> I probably got some details wrong, what I am going for with lsn_threshold is
> that we'd freeze an already dirty page if it's not been updated within 10% of
> the LSN distance to the last VACUUM.

It seems to me that you're reinventing something akin to eager
freezing strategy here. At least that's how I define it, since now
you're bringing the high level context into it; what happens with the
table, with VACUUM operations, and so on. Obviously this requires
tracking the metadata that you suppose will be available in some way
or other, in particular things like lsn_of_last_vacuum.

What about unlogged/temporary tables? The obvious thing to do there is
what I did in the patch that was reverted (freeze whenever the page
will thereby become all-frozen), and forget about LSNs. But you have
already objected to that part, specifically.

BTW, you still haven't changed the fact that you get rather different
behavior with checksums/wal_log_hints. I think that that's good, but
you didn't seem to.

> I don't think the speculation is that fundamentally different - a heavily
> updated table with a bit of a historic, non-changing portion, makes
> vacuum_freeze_strategy_threshold freeze way more aggressively than either "any
> record" or "any fpi".

That's true. The point I was making is that both this proposal and
eager freezing are based on some kind of high level picture of the
needs of the table, based on high level metadata. To me that's the
defining characteristic.

> > And even when we lose, you generally still won't have been completely
> > wrong -- even then there generally will indeed be a second FPI later
> > on for the same page, to go with everything else. This makes the
> > wasted freezing even less significant, on a comparative basis!
>
> This is precisely why I think that we can afford to be quite aggressive about
> freezing already dirty pages...

I'm beginning to warm to this idea, now that I understand it a little better.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-01-27 07:18:43 Re: Syncrep and improving latency due to WAL throttling
Previous Message Thomas Munro 2023-01-27 07:10:32 Re: Reducing power consumption on idle servers