From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Eagerly scan all-visible pages to amortize aggressive vacuum |
Date: | 2025-02-04 02:09:30 |
Message-ID: | bepzdxjhmgueoiddeemzhhgvx4nr2ysrttrgyxidask7m5crzt@sws4c5adn544 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-01-29 14:12:52 -0500, Melanie Plageman wrote:
> From 71f32189aad510b73d221fb0478ffd916e5e5dde Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 27 Jan 2025 12:23:00 -0500
> Subject: [PATCH v14] Eagerly scan all-visible pages to amortize aggressive
> vacuum
>
> Amortize the cost of an aggressive vacuum by eagerly scanning some
> all-visible but not all-frozen pages during normal vacuums.
I think it'd be good to explain the problem this is trying to address a bit
more (i.e. the goal is to avoid the situation which a lot of work is deferred
until an aggressive vacuum, which is then very expensive).
> Because the goal is to freeze these all-visible pages, all-visible pages
> that are eagerly scanned and set all-frozen in the visibility map are
> counted as successful eager freezes and those not frozen are considered
> failed eager freezes.
I don't really understand this sentence. "Because the goal is to freeze these
all-visible pages" doesn't really relate with the rest of the sentence.
> + <varlistentry id="guc-vacuum-max-eager-freeze-failure-rate" xreflabel="vacuum_max_eager_freeze_failure_rate">
> + <term><varname>vacuum_max_eager_freeze_failure_rate</varname> (<type>floating point</type>)
> + <indexterm>
> + <primary><varname>vacuum_max_eager_freeze_failure_rate</varname> configuration parameter</primary>
> + </indexterm>
> + </term>
> + <listitem>
> + <para>
> + Specifies the maximum fraction of pages that
> + <command>VACUUM</command> may scan and <emphasis>fail</emphasis> to set
> + all-frozen in the visibility map before disabling eager scanning. A
> + value of <literal>0</literal> disables eager scanning altogether. The
> + default is <literal>0.03</literal> (3%).
> + </para>
Fraction of what?
> + <para>
> + Note that when eager scanning is enabled, successful page freezes
> + do not count against this limit, although they are internally
> + capped at 20% of the all-visible but not all-frozen pages in the
> + relation. Capping successful page freezes helps amortize the
> + overhead across multiple normal vacuums.
> + </para>
What does it mean that they are not counted, but are capped?
> + <para>
> + If a table is building up a backlog of all-visible but not all-frozen
> + pages, a normal vacuum may choose to scan skippable pages in an effort to
> + freeze them. Doing so decreases the number of pages the next aggressive
> + vacuum must scan. These are referred to as <firstterm>eagerly
> + scanned</firstterm> pages. Eager scanning can be tuned to attempt
> + to freeze more all-visible pages by increasing
> + <xref linkend="guc-vacuum-max-eager-freeze-failure-rate"/>. Even if eager
> + scanning has kept the number of all-visible but not all-frozen pages to a
> + minimum, most tables still require periodic aggressive vacuuming.
> + </para>
Maybe mention that the aggressive vacuuming will often be cheaper than without
eager freezing, even if necessary?
> + * Normal vacuums count all-visible pages eagerly scanned as a success when
> + * they are able to set them all-frozen in the VM and as a failure when they
> + * are not able to set them all-frozen.
Maybe some more punctuation would make this more readable? Or a slight
rephrasing?
> + * Because we want to amortize the overhead of freezing pages over multiple
> + * vacuums, normal vacuums cap the number of successful eager freezes to
> + * MAX_EAGER_FREEZE_SUCCESS_RATE of the number of all-visible but not
> + * all-frozen pages at the beginning of the vacuum. Once the success cap has
> + * been hit, eager scanning is disabled for the remainder of the vacuum of the
> + * relation.
It also caps the maximum "downside" of freezing eagerly, right? Seems worth
mentioning.
> + /*
> + * Now calculate the eager scan start block. Start at a random spot
> + * somewhere within the first eager scan region. This avoids eager
> + * scanning and failing to freeze the exact same blocks each vacuum of the
> + * relation.
> + */
If I understand correctly, we're not really choosing a spot inside the first
eager scan region, we determine the bounds of the first region?
> @@ -930,16 +1188,21 @@ lazy_scan_heap(LVRelState *vacrel)
> vacrel->current_block = InvalidBlockNumber;
> vacrel->next_unskippable_block = InvalidBlockNumber;
> vacrel->next_unskippable_allvis = false;
> + vacrel->next_unskippable_eager_scanned = false;
> vacrel->next_unskippable_vmbuffer = InvalidBuffer;
>
> - while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm))
> + while (heap_vac_scan_next_block(vacrel, &blkno, &all_visible_according_to_vm,
> + &was_eager_scanned))
Pedantic^3: Is past tense really appropriate? We *will* be scanning that page
in the body of the loop, right?
> @@ -1064,7 +1327,45 @@ lazy_scan_heap(LVRelState *vacrel)
> if (got_cleanup_lock)
> lazy_scan_prune(vacrel, buf, blkno, page,
> vmbuffer, all_visible_according_to_vm,
> - &has_lpdead_items);
> + &has_lpdead_items, &vm_page_frozen);
> +
> + /*
> + * Count an eagerly scanned page as a failure or a success.
> + */
> + if (was_eager_scanned)
Hm - how should pages be counted that we couldn't get a lock on? I think
right now they'll be counted as a failure, but that doesn't seem quite right.
> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 0ab921a169b..32a1b8c46a1 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -2826,6 +2826,12 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
> tab->at_params.is_wraparound = wraparound;
> tab->at_params.log_min_duration = log_min_duration;
> tab->at_params.toast_parent = InvalidOid;
> +
> + /*
> + * Later we check reloptions for vacuum_max_eager_freeze_failure_rate
> + * override
> + */
> + tab->at_params.max_eager_freeze_failure_rate = vacuum_max_eager_freeze_failure_rate;
> tab->at_storage_param_vac_cost_limit = avopts ?
> avopts->vacuum_cost_limit : 0;
> tab->at_storage_param_vac_cost_delay = avopts ?
I'd mention where that is, so that a reader of that comment doesn't have to
search around...
I think this is pretty close!
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2025-02-04 03:05:07 | Re: SQL/JSON json_table plan clause |
Previous Message | jian he | 2025-02-04 02:04:36 | Re: Non-text mode for pg_dumpall |