Quick Links

Re: We need to log aborted autovacuums

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: We need to log aborted autovacuums
Date:	2011-01-07 19:00:20
Message-ID:	4D2762C4.4050405@agliodbs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg,

> It's already possible to detect the main symptom--dead row percentage is
> much higher than the autovacuum threshold, but there's been no recent
> autovacuum. That makes me less enthusiastic that there's such a genuine
> need to justify the overhead of storing more table stats just to detect
> the same thing a little more easily.

The problem is that while this gives you the symptoms, it doesn't give
you the cause. The lack of vacuum could be occurring for any of 4 reasons:

1) Locking
2) You have a lot of tables and not enough autovac_workers / too much
sleep time
3) You need to autovac this particular table more frequently, since it
gets dirtied really fast
4) The table has been set with special autovac settings which keep it
from being autovac'd

We can currently distinguish between cased 2, 3, 4 based on existing
available facts. However, distinguishing case 1 from 2 or 3, in
particular, isn't currently possible except by methods which require
collecting a lot of ad-hoc monitoring data over a period of time. This
makes the effort required for the diagnosis completely out of proportion
with the magnitude of the problem.

It occurs to me that another way of diagnosis would simply be a way to
cause the autovac daemon to spit out output we could camp on, *without*
requiring the huge volumes of output also required for DEBUG3. This
brings us back to the logging idea again.

> We could argue both sides of the trade-off of tracking this directly in
> stats for some time, and I'd never expect there to be a clear victory
> for either perspective. I've run into this vacuum problem a few times,
> but certainly less than I've run into "why is the stats table so huge?"

I really don't think that argument applies to either patch;
last_autovac_attempt *or* the last_stats_reset time, since neither event
is expected to occur frequently. If you have last_autovac_attempt (for
example) being updated frequently, you clearly had a db problem bigger
than the size of the stats table.

Given that our road ahead necessarily includes adding more and more
monitoring and admin data to PostgreSQL (because our DBA users demand
it), I think that we should give some thought to the issue of storing
DBA stats in general. By DBA stats I mean statistics which aren't used
in query planning, but are used in monitoring, trending, and
troubleshooting. I'm thinking these ought to have their own relation or
relations.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Re: We need to log aborted autovacuums at 2011-01-07 04:33:23 from Greg Smith

Responses

Re: We need to log aborted autovacuums at 2011-01-08 01:15:12 from Greg Smith
Re: We need to log aborted autovacuums at 2011-01-15 09:47:56 from Greg Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2011-01-07 19:15:40	Re: making an unlogged table logged
Previous Message	Magnus Hagander	2011-01-07 18:48:57	Re: system views for walsender activity