Re: autovacuum not prioritising for-wraparound tables

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum not prioritising for-wraparound tables
Date: 2013-01-25 17:56:46
Message-ID: CAFNqd5UcZNVPTGwELU3ZWYEs4bqdShmFCffmN_oAYXat9z4apQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 12:00 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-01-25 11:51:33 -0500, Tom Lane wrote:
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> > 2. for other tables, consider floor(log(size)). This makes tables of
>> > sizes in the same ballpark be considered together.
>>
>> > 3. For tables of similar size, consider
>> > (n_dead_tuples - threshold) / threshold.
>> > "threshold" is what gets calculated as the number of tuples over which
>> > a table is considered for vacuuming. This number, then, is a relative
>> > measure of how hard is vacuuming needed.
>>
>> The floor(log(size)) part seems like it will have rather arbitrary
>> behavioral shifts when a table grows just past a log boundary. Also,
>> I'm not exactly sure whether you're proposing smaller tables first or
>> bigger tables first, nor that either of those orderings is a good thing.
>
> That seems dubious to me as well.
>
>> I think sorting by just age(relfrozenxid) for for-wraparound tables, and
>> just the n_dead_tuples measurement for others, is probably reasonable
>> for now. If we find out that has bad behaviors then we can look at how
>> to fix them, but I don't think we have enough understanding yet of what
>> the bad behaviors might be.
>
> If we want another ordering criterion than that it might be worth
> thinking about something like n_dead_tuples/relpages to make sure that
> small tables with a high dead tuples ratio get vacuumed in time.

I'd imagine it a good idea to reserve some autovacuum connections for small
tables, that is, to have a maximum relpages for some portion of the
connections.

That way you don't get stuck having all the connections busy working on
huge tables and leaving small tables starved. That scenario seems pretty
obvious.

I'd be inclined to do something a bit more sophisticated than just
age(relfrozenxid) for wraparound; I'd be inclined to kick off large tables'
wraparound vacuums earlier than those for smaller tables.

With a little bit of noodling around, here's a thought for a joint function
that I *think* has reasonably common scales:

f(deadtuples, relpages, age) =
deadtuples/relpages + e ^ (age*ln(relpages)/2^32)

When the age of the table is low, this is dominated by the deadtuple/relpages
part of the equation; you vacuum tables based on what has the largest % of
dead tuples.

But when a table is not vacuumed for a long time, the second term will kick
in, and we'll tend to:
a) Vacuum the ones that are largest the earliest, but nonetheless
b) Vacuum them as the ration of age/2^32 gets close to 1.

This function assumes relpages > 0, and there's a constant, 2^32, there which
might be fiddled with.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-01-25 18:01:16 Re: autovacuum not prioritising for-wraparound tables
Previous Message Tom Lane 2013-01-25 17:52:46 Re: autovacuum not prioritising for-wraparound tables