Re: autovacuum not prioritising for-wraparound tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum not prioritising for-wraparound tables
Date: 2013-01-29 03:03:19
Message-ID: CA+TgmoYofaowOCgJjVgx-Er9ErfCk+c9XsU4GLQuJeFULNxRfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 27, 2013 at 2:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Fri, Jan 25, 2013 at 9:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think that to do this right, we need to consider not only the status
>> quo but the trajectory. For example, suppose we have two tables to
>> process, one of which needs a wraparound vacuum and the other one of
>> which needs dead tuples removed. If the table needing the wraparound
>> vacuum is small and just barely over the threshold, it isn't urgent;
>
> But it being small, it also won't take long to vacuum. Why not just do it?

Because "big" and "small" are relative terms.

>> but if it's large and way over the threshold, it's quite urgent.
>> Similarly, if the table which needs dead tuples removed is rarely
>> updated, postponing vacuum is not a big deal, but if it's being
>> updated like crazy, postponing vacuum is a big problem.
>
> I don't see this as being the case. If it is being updated like
> crazy, it doesn't matter whether it meets the threshold to have tuples
> removed *right at the moment* or not. It will meet that threshold
> soon. If you can't keep up with that need with your current settings,
> you have a steady-state problem. Changing the order, or not changing
> the order, isn't going to make a whole lot of difference, you need to
> overcome the steady-state problem.

Sure. There are many people for which vacuum has no trouble at all
keeping up, and others for whom it isn't even close to keeping up.
People in the first category aren't likely to be damaged by the
proposed change and people in the second category aren't likely to be
helped. The issue is around what happens for people who are close to
the edge. Will things get better or worse? Alvaro (and Simon)
content that there will be cases where full-cluster shutdowns that
happen under today's algorithm would be avoided if we prioritize
anti-wraparound vacuums over dead-tuple-cleanup vacuums. I believe
that. I also believe that there will be cases where it goes the other
way - where a bloat situation that remains under control with today's
algorithm gets just perturbed just enough by this change to cause
runaway table bloat. Or at least, I contend that we don't have nearly
enough evidence that that *won't* happen to risk back-patching a
change of this type.

In my experience, full-cluster shutdowns caused by autovacuum failing
to advance datfrozenxid are extremely rare - and if they do happen,
it's usually because the vacuum cost delay is set too high, or the
cost limit too low. If we want to attack the problem of making sure
such shutdowns don't happen, I'd argue that the most promising way to
attack that problem is to progressively ratchet the delay down and the
cost limit up as age(relfrozenxid) gets larger. On the other hand,
problems with runaway table bloat are relatively common. Heikki's
8.4-era changes have of course helped quite a bit, but the problem is
still very, very common. All you need is a series of "long"-running
transactions (like a couple of *minutes* on a busy system), or a
vacuum cost delay that is just ever-so-slightly too high, and you're
completely hosed. I agree with you that if you've got a database
that's well-tuned, so that you aren't skating on the ragged edge of
disaster, this change probably won't break anything. But I am willing
to bet that there are people out there who are, completely
unknowingly, skating on that ragged edge. It is not as if we provide
an easy way to know whether you've got the cost delay set optimally.

>> Categorically
>> putting autovacuum wraparound tables ahead of everything else seems
>> simplistic, and thinking that more dead tuples is more urgent than
>> fewer dead tuples seems *extremely* simplistic.
>>
>> I ran across a real-world case where a user had a small table that had
>> to be vacuumed every 15 seconds to prevent bloat. If we change the
>> algorithm in a way that gives other things priority over that table,
>
> Eventually an anti-wrap around is going to be done, and once it starts
> it does have priority, because things already underway don't get
> preempted. Have they ever reached that point? Did it cause problems?

In that specific case, I don't know.

>> then that user could easily get hosed when they install a maintenance
>> release containing this change.
>
> Yeah, I don't know that back-patching is a good idea, or at least not soon.

That's all I'm arguing. I think it would be nice to do something for
9.3, preferably a little more sophisticated than just "put all
anti-wraparound vacuums first".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-29 03:23:57 Re: Hm, table constraints aren't so unique as all that
Previous Message Bruce Momjian 2013-01-29 02:47:27 Re: pg_ctl idempotent option