Re: Make autovacuum sort tables in descending order of xid_age

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Christophe Pettus <xof(at)thebuild(dot)com>
Subject: Re: Make autovacuum sort tables in descending order of xid_age
Date: 2019-12-12 21:35:49
Message-ID: 2ad2a9fa-32eb-29aa-07ee-f0fe75ad4db5@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/19 11:26 AM, David Fetter wrote:
> On Thu, Dec 12, 2019 at 08:02:25AM -0800, Mark Dilger wrote:
>> On 11/30/19 2:23 PM, David Fetter wrote:
>>> On Sat, Nov 30, 2019 at 10:04:07AM -0800, Mark Dilger wrote:
>>>> On 11/29/19 2:21 PM, David Fetter wrote:
>>>>> On Fri, Nov 29, 2019 at 07:01:39PM +0100, David Fetter wrote:
>>>>>> Folks,
>>>>>>
>>>>>> Per a suggestion Christophe made, please find attached a patch to
>>>>>> $Subject:
>>>>>>
>>>>>> Apart from carefully fudging with pg_resetwal, and short of running in
>>>>>> production for a few weeks, what would be some good ways to test this?
>>>>>
>>>>> Per discussion on IRC with Sehrope Sarkuni, please find attached a
>>>>> patch with one fewer bug, this one in the repalloc() calls.
>>>>
>>>> Hello David,
>>>>
>>>> Here are my initial thoughts.
>>>>
>>>> Although you appear to be tackling the problem of vacuuming tables
>>>> with older Xids first *per database*,
>>>
>>> Yes, that's what's come up for me in production, but lately,
>>> production has consisted of a single active DB maxing out hardware. I
>>> can see how in other situations--multi-tenant, especially--it would
>>> make more sense to sort the DBs first.
>>
>> I notice you don't address that in your latest patch. Do you have
>> any thoughts on whether that needs to be handled in this patch?
>
> My thought is that it doesn't.

I can live with that for now. I'd like the design to be compatible with
revisiting that in a subsequent patch.

>>>> I have not tested this change, but I may do so later today or perhaps
>>>> on Monday.
>>
>> The code compiles cleanly and passes all regression tests, but I don't
>> think those tests really cover what you are changing. Have you been
>> using any test framework for this?
>
> I don't have one :/

We need to get that fixed.

>> I wonder if you might add information about table size, table changes,
>> and bloat to your RelFrozenXidAge struct and modify rfxa_comparator to
>> use a heuristic to cost the (age, size, bloat, changed) grouping and
>> sort on that cost, such that really large bloated tables with old xids
>> might get vacuumed before smaller, less bloated tables that have
>> even older xids. Sorting the tables based purely on xid_age seems to
>> ignore other factors that are worth considering. I do not have a
>> formula for how those four factors should be weighted in the heuristic,
>> but you are implicitly assigning three of them a weight of zero in
>> your current patch.
>
> I think it's vastly premature to come up with complex sorting systems
> right now. Just sorting in descending order of age should either have
> or not have positive effects.

I hear what you are saying, but I'm going to argue the other side.

Let C = 1.00000002065
Let x = xid_age for a table
Let v = clamp(n_dead_tuples / reltuples*2) to max 0.5
Let a = clamp(changes_since_analyze / reltuples) to max 0.5

Let score = C**x + v + a

With x = 1 million => C**x = 1.02
x = 200 million => C**x = 62.2
x = 2**32 => C**x = FLT_MAX - delta

The maximum contribution to the score that n_dead_tuples and
changes_since_analyze can make is 1.0. Once the xid age reaches one
million, it will start to be the dominant factor. By the time it
reaches the default value of 200 million for freeze_max_age it is
far and away the dominant factor, and the xid age of one table vs.
another never overflows FLT_MAX given that 2**32 is the largest
xid age your current system can store in the uint32 you are using.

The computed score is a 32 bit float, which takes no more memory
to store than the xid_age field you are storing. So storing the
score rather than the xid age is memory-wise equivalent to your
patch.

I doubt the computation time for the exponential is relevant
compared to the n*log(n) average sorting time of the quicksort.
It is even less relevant compared to the time it takes to vacuum
the tables. I doubt my proposal has a measurable run-time impact.

On the upside, if you have a database with autovacuum configured
aggressively, you can get the tables with the most need vacuumed
first, with need computed relative to vac_scale_factor and
anl_scale_factor, which helps for a different use case than yours.
The xid age problem might not exist for databases where autovacuum
has enough resources to never fall behind. Those databases will
have other priorities for where autovacuum spends its time.

I'm imagining coming back with two patches later, one that does
something more about choosing which database to vacuum first, and
another that recomputes which table to vacuum next when a worker
finishes vacuuming a table. These combined could help keep tables
that are sensitive to statistics changes vacuumed more frequently
than others.

>> relation_needs_vacanalyze currently checks the reltuples, n_dead_tuples
>> and changes_since_analyze along with vac_scale_factor and
>> anl_scale_factor for the relation, but only returns booleans dovacuum,
>> doanalyze, and wraparound.
>
> Yeah, I looked at that. It's for a vastly different purpose, namely
> deciding what's an emergency and what's probably not, but needs
> attention anyhow. My goal was something a little finer-grained and, I
> hope, a little easier to establish the (lack of) benefits because only
> one thing is getting changed.

That's all I'll say for now. Hopefully other members of the
community will weigh in.

--
Mark Dilger

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-12-12 21:44:25 Re: What constrains the range of SERIALIZABLEXACT xmin values?
Previous Message Thomas Munro 2019-12-12 21:30:19 What constrains the range of SERIALIZABLEXACT xmin values?