Re: New strategies for freezing, advancing relfrozenxid early

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-08-29 18:47:16
Message-ID: fcd8bda4474a319dc563447e7f8f4e2147500b8c.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2022-08-25 at 14:21 -0700, Peter Geoghegan wrote:
> The main high level goal of this work is to avoid painful, disruptive
> antiwraparound autovacuums (and other aggressive VACUUMs) that do way
> too much "catch up" freezing, all at once, causing significant
> disruption to production workloads.

Sounds like a good goal, and loosely follows the precedent of
checkpoint targets and vacuum cost delays.

> A new GUC/reloption called vacuum_freeze_strategy_threshold is added
> to control freezing strategy (also influences our choice of skipping
> strategy). It defaults to 4GB, so tables smaller than that cutoff
> (which are usually the majority of all tables) will continue to
> freeze
> in much the same way as today by default. Our current lazy approach
> to
> freezing makes sense there, and should be preserved for its own sake.

Why is the threshold per-table? Imagine someone who has a bunch of 4GB
partitions that add up to a huge amount of deferred freezing work.

The initial problem you described is a system-level problem, so it
seems we should track the overall debt in the system in order to keep
up.

> for this table, at this time: Is it more important to advance
> relfrozenxid early (be eager), or to skip all-visible pages instead
> (be lazy)? If it's the former, then we must scan every single page
> that isn't all-frozen according to the VM snapshot (including every
> all-visible page).

This feels too absolute, to me. If the goal is to freeze more
incrementally, well in advance of wraparound limits, then why can't we
just freeze 1000 out of 10000 freezable pages on this run, and then
leave the rest for a later run?

> Thoughts?

What if we thought about this more like a "background freezer". It
would keep track of the total number of unfrozen pages in the system,
and freeze them at some kind of controlled/adaptive rate.

Regular autovacuum's job would be to keep advancing relfrozenxid for
all tables and to do other cleanup, and the background freezer's job
would be to keep the absolute number of unfrozen pages under some
limit. Conceptually those two jobs seem different to me.

Also, regarding patch v1-0001-Add-page-level-freezing, do you think
that narrows the conceptual gap between an all-visible page and an all-
frozen page?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-08-29 19:38:57 Re: replacing role-level NOINHERIT with a grant-level option
Previous Message Tomas Vondra 2022-08-29 18:04:13 Re: logical decoding and replication of sequences