Re: Make autovacuum sort tables in descending order of xid_age

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Christophe Pettus <xof(at)thebuild(dot)com>
Subject: Re: Make autovacuum sort tables in descending order of xid_age
Date: 2020-01-09 17:23:46
Message-ID: CA+TgmoZPsiERnWf2eFSk7F2AkmOymhQKGyVFsB5t4BPim7p5Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2019 at 2:26 PM David Fetter <david(at)fetter(dot)org> wrote:
> > I wonder if you might add information about table size, table changes,
> > and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
> > use a heuristic to cost the (age, size, bloat, changed) grouping and
> > sort on that cost, such that really large bloated tables with old xids
> > might get vacuumed before smaller, less bloated tables that have
> > even older xids. Sorting the tables based purely on xid_age seems to
> > ignore other factors that are worth considering. I do not have a
> > formula for how those four factors should be weighted in the heuristic,
> > but you are implicitly assigning three of them a weight of zero in
> > your current patch.
>
> I think it's vastly premature to come up with complex sorting systems
> right now. Just sorting in descending order of age should either have
> or not have positive effects.

A lot of previous efforts to improve autovacuum scheduling have fallen
down precisely because they did something that was so simple that it
was doomed to regress as many cases as it improved, so I wouldn't be
too quick to dismiss Mark's suggestion. In general, sorting by XID age
seems like it should be better, but it's not hard to come up with a
counterexample: suppose table T1 is going to wrap around in 4 hours
and takes 4 hours to vacuum, but table T2 is going to wrap around in 2
hours and takes 1 hour to vacuum. Your algorithm will prioritize T2,
but it's better to prioritize T1. A second autovacuum worker may
become available for this database later and still get T2 done before
we run into trouble, but if we don't start T1 right now, we're hosed.
The current algorithm gets this right if T1 was defined before T2 and
thus appears earlier in pg_class; your algorithm gets it wrong
regardless.

I've had the thought for a while now that perhaps we ought to try to
estimate the rate of XID consumption, because without that it's really
hard to make smart decisions. In the above example, if the rate of XID
consumption is 4x slower, then it might be smarter to vacuum T2 first,
especially if T2 is very heavily updated compared to T1 and might
bloat if we don't deal with it right away. At the lower rate of XID
consumption, T1 is an urgent problem, but not yet an emergency.
However, I've noticed that most people who complain about unexpected
wraparound vacuums have them hit in peak periods, which when you think
about it, makes a lot of sense. If you consume XIDs 10x as fast during
your busy time as your non-busy times, then the XID that triggers the
wraparound scan on any given table is very likely to occur during a
busy period. So the *current* rate of XID consumption might not be
very informative, which makes figuring out what to do here awfully
tricky.

I think Mark's suggestion of some kind of formula that takes into
account the XID age as well as table size and bloat is probably a
pretty good one. We'll probably need to make some of the parameters of
that formula configurable. Ideally, they'll be easy enough to
understand that users can say "oh, I'm using XIDs more or less quickly
than normal here, so I need to change parameter X" and even figure out
-- without using a calculator -- what sort of value for X might be
appropriate.

When there's a replication slot or prepared transaction or open
transaction holding back xmin, you can't advance the relfrozenxid of
that table past that point no matter how aggressively you vacuum it,
so it would probably be a good idea to set up the formula so that the
weight is based on the amount by which we think we'll be able to
advance relfrozenxid rather than, say, the age relative to the last
XID assigned.

The dominant cost of vacuuming a table is often the number and size of
the indexes rather than the size of the heap, particularly because the
visibility map may permit skipping a lot of the heap. So you have N
indexes that need to be read completely and 1 heap that needs to be
read only partially. So, whatever portion of the score comes from
estimating the cost of vacuuming that table ought to factor in the
size of the indexes. Perhaps it should also consider the contents of
the visibility map, although I'm less sure about that.

One problem with the exponential in Mark's formula is that it might
treat small XID differences between old tables as more important than
they really are. I wonder if it might be a better idea to compute
several different quantities and use the maximum from among them as
the prioritization. We can model the priority of vacuuming a
particular table as the benefit of vacuuming that table multiplied by
the effort. The effort is easy to model: just take the size of the
table and its indexes. The benefit is trickier, because there are four
different possible benefits: relfrozenxid advancement, relminmxid
advancement, dead tuple removal, and marking pages all-visible. So,
suppose we model each benefit by a separate equation. For XID
advancement, figure figure out the difference between relfrozenxid and
RecentGlobalXmin; if it's less than vacuum_freeze_min_age, then 0;
else multiply the amount in excess of vacuum_freeze_min_age by some
constant. Analogously for MXID advancement. For bloat, the number of
dead tuples multiplied by some other constant, presumably smaller. For
marking pages all-visible, if we want to factor that in, the number of
pages that are not currently all-visible multiplied by the smallest
constant of all. Take the highest of those benefits and multiple by
the size of the table and its indexes to find the priority.

Whatever formula we use exactly, we want XID-age to be the dominant
consideration for tables that are in real wraparound danger, but, I
think, not to the complete exclusion of table size and bloat
considerations. There is certainly a point at which a table is so near
wraparound that it needs to take precedence over tables that are just
being vacuumed for bloat, but you don't want that to happen
unnecessarily, because bloat is *really* bad. And you don't
necessarily just have one table in wraparound danger; if there are
multiples, you want to choose between them intelligently, and the fact
that relfrozenxid differs by 1 shouldn't dominate a 2x difference in
the on-disk size.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-01-09 17:31:27 Re: Add pg_file_sync() to adminpack
Previous Message Stephen Frost 2020-01-09 17:16:07 Re: Add pg_file_sync() to adminpack