Re: Berserk Autovacuum (let's save next Mandrill)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-13 21:38:51
Message-ID: 20200313213851.ejrk5gptnmp65uoo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> As I understand, the initial motivation of this patch was to avoid disruptive
> anti-wraparound vacuums on insert-only table. But if vacuum were triggered at
> all, it would freeze the oldest tuples, which is all that's needed; especially
> since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need
> to be vacuumed again. Recently written tuples wouldn't be frozen, which is ok,
> they're handled next time.
>
> Another motivation of the patch is to allow indexonly scan, for which the
> planner looks at pages' "relallvisible" fraction (and at execution if a page
> isn't allvisible, visits the heap). Again, that happens if vacuum were run at
> all. Again, some pages won't be marked allvisible, which is fine, they're
> handled next time.
>
> I think freeze_min_age=0 could negatively affect people who have insert-mostly
> tables (I'm not concerned, but that includes us). If they consistently hit the
> autovacuum insert threshold before the cleanup threshold for updated/deleted
> tuples, any updated/deleted tuples would be frozen, which would be
> wasteful:

I think that's a valid concern.

> |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause
> |VACUUM to do useless work: freezing a row version is a waste of time if the row
> |is modified soon thereafter (causing it to acquire a new XID). So the setting
> |should be large enough that rows are not frozen until they are unlikely to
> |change any more.

I think the overhead here might be a bit overstated. Once a page is
dirtied (or already dirty) during vacuum, and we freeze a single row
(necessating WAL logging), there's not really a good reason to not also
freeze the rest of the row on that page. The added cost for freezing
another row is miniscule compared to the "constant" cost of freezing
anything on the page. It's of course different if there are otherwise
no tuples worth freezing on the page (not uncommon). But there's really
no reason for that to be the case:

Afaict the only problem with more aggressively freezing when we touch
(beyond hint bits) the page anyway is that we commonly end up with
multiple WAL records for the same page:

1) lazy_scan_heap()->heap_page_prune() will log a XLOG_HEAP2_CLEAN record, but leave
itemids in place most of the time
2) lazy_scan_heap()->log_heap_freeze() will log a XLOG_HEAP2_FREEZE_PAGE record
3a) if no indexes exist/index cleanup is disabled:
lazy_vacuum_page()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN
record, removing dead tuples (including itemids)
3b) if indexes need to be cleaned up,
lazy_vacuum_heap()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN

which is not nice. It likely is worth merging xl_heap_freeze_page into
xl_heap_clean, and having heap pruning always freeze once it decides to
dirty a page.

We could probably always prune dead tuples as part of heap_prune_chain()
if there's no indexes - but I'm doubtful it's worth it, since there'll
be few tables with lots of dead tuples that don't have indexes.

Merging 3b's WAL record would be harder, I think.

There's also a significant source of additional WAL records here, one
that I think should really not have been introduced:

4) HeapTupleSatisfiesVacuum() called both by heap_prune_chain(), and
lazy_scan_heap() will often trigger a WAL record when the checksums or
wal_log_hint_bits are enabled. If the page hasn't been modified in the
current checkpoint window (extremely common for VACUUM, reasonably
common for opportunistic pruning), we will log a full page write.

Imo this really should have been avoided when checksums were added,
that's a pretty substantial and unnecessary increase in overhead.

It's probably overkill to tie fixing the 'insert only' case to improving
the WAL logging for vacuuming / pruning. But it'd certainly would
largely remove the tradeoff discussed here, by removing additional
overhead of freezing in tables that are also updated.

> Also, there was a discussion about index cleanup with the conclusion that it
> was safer not to skip it, since otherwise indexes might bloat. I think that's
> right, since vacuum for cleanup is triggered by the number of dead heap tuples.
> To skip index cleanup, I think you'd want a metric for
> n_dead_since_index_cleanup. (Or maybe analyze could track dead index tuples
> and trigger vacuum of each index separately).
>
> Having now played with the patch, I'll suggest that 10000000 is too high a
> threshold. If autovacuum runs without FREEZE, I don't see why it couldn't be
> much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.

ISTM that the danger of regressing workloads due to suddenly repeatedly
scanning huge indexes that previously were never / rarely scanned is
significant (if there's a few dead tuples, otherwise most indexes will
be able to skip the scan since the vacuum_cleanup_index_scale_factor
introduction)).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-03-13 21:39:06 Re: range_agg
Previous Message Peter Eisentraut 2020-03-13 21:24:02 Re: backend type in log_line_prefix?