Re: New strategies for freezing, advancing relfrozenxid early

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-08-30 18:11:41
Message-ID: 1892cc797f86924e31f00fd3c703ca464936c65e.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2022-08-25 at 14:21 -0700, Peter Geoghegan wrote:
> Attached patch series is a completely overhauled version of earlier
> work on freezing. Related work from the Postgres 15 cycle became
> commits 0b018fab, f3c15cbe, and 44fa8488.
>
> Recap
> =====
>
> The main high level goal of this work is to avoid painful, disruptive
> antiwraparound autovacuums (and other aggressive VACUUMs) that do way
> too much "catch up" freezing, all at once

I agree with the motivation: that keeping around a lot of deferred work
(unfrozen pages) is risky, and that administrators would want a way to
control that risk.

The solution involves more changes to the philosophy and mechanics of
vacuum than I would expect, though. For instance, VM snapshotting,
page-level-freezing, and a cost model all might make sense, but I don't
see why they are critical for solving the problem above. I think I'm
still missing something. My mental model is closer to the bgwriter and
checkpoint_completion_target.

Allow me to make a naive counter-proposal (not a real proposal, just so
I can better understand the contrast with your proposal):

* introduce a reloption unfrozen_pages_target (default -1, meaning
infinity, which is the current behavior)
* introduce two fields to LVRelState: n_pages_frozen and
delay_skip_count, both initialized to zero
* when marking a page frozen: n_pages_frozen++
* when vacuum begins:
if (unfrozen_pages_target >= 0 &&
current_unfrozen_page_count > unfrozen_pages_target)
{
vacrel->delay_skip_count = current_unfrozen_page_count -
unfrozen_pages_target;
/* ?also use more aggressive freezing thresholds? */
}
* in lazy_scan_skip(), have a final check:
if (vacrel->n_pages_frozen < vacrel->delay_skip_count)
{
break;
}

I know there would still be some problem cases, but to me it seems like
we solve 80% of the problem in a couple dozen lines of code.

a. Can you clarify some of the problem cases, and why it's worth
spending more code to fix them?

b. How much of your effort is groundwork for related future
improvements? If it's a substantial part, can you explain in that
larger context?

c. Can some of your patches be separated into independent discussions?
For instance, patch 1 has been discussed in other threads and seems
independently useful, and I don't see the current work as dependent on
it. Patch 4 also seems largerly independent.

d. Can you help give me a sense of scale of the problems solved by
visibilitymap snapshots and the cost model? Do those need to be in v1?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-08-30 18:15:40 Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Previous Message Tom Lane 2022-08-30 18:07:41 Re: Postmaster self-deadlock due to PLT linkage resolution