Skip site navigation (1) Skip section navigation (2)

Re: We need to log aborted autovacuums

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: We need to log aborted autovacuums
Date: 2011-01-05 06:55:02
Message-ID: 4D2415C6.6090407@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Josh Berkus wrote:
> I've been trying to diagnose in a production database why certain tables
> never get autovacuumed despite having a substantial % of updates.  The
> obvious reason is locks blocking autovacuum from vacuuming the table ...
>   

Missed this dicussion when it popped up but have plenty to say about it 
now.  What I do here is look for such anomolies using 
pg_stat_user_tables, that the dead rows number has exceeded the vacuum 
threshold.  That comparison is logged in the code at DEBUG3:

        elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f 
(threshold %.0f)",
             NameStr(classForm->relname),
             vactuples, vacthresh, anltuples, anlthresh);

But a rough computation isn't too hard to derive in a report, so long as 
you haven't customized per-table parameters.  As you suggested in your 
message, the really bad cases here stick out a whole lot.  If you pay 
the slightest amount of attention to the dead row percentages they jump 
right out at you.  This all works easily on any version back to 8.3.  
Not having as much relevant data stored in pg_stat_user_tables makes the 
problem cases less obvious to spot in older versions.

If I start seeing these badly maintained tables and suspect locking is 
getting in the way, I then dump traces from pg_locks+pg_stat_activity 
often enough that I can estimate how often someone has an interfering 
lock and what they're doing.

Should the log level on this message go up from DEBUG3?  I could see 
rewriting it so that it logs at DEBUG1 instead when 
Log_autovacuum_min_duration is set *and* when the trigger threshold is 
crossed, and at DEBUG3 the rest of the time.  Given you can derive this 
with a bit of work in userland, I don't see this even being justified as 
an INFO or LOG level message.  Anytime I can script a SQL-level monitor 
for something that's easy to tie into Nagios or something, I greatly 
prefer that to log file scraping for it anyway.

> What I'd like to do is add some logging code to autovacuum.c so that if
> log_autovacuum is any value other than -1, failure to vacuum due to
> locks gets logged.   Does this make sense?
>   

The general idea is interesting and probably more productive for the 
situation you theorize is happening then messing with the logging 
discussed above.  But that's not where the code needs to go--the lock 
isn't opened until much further down the function call stack.  Attached 
quickie and only tested for compilation patch probably does what you 
want here.  Since this would eliminate the messy follow-up step I 
sometimes have gone through, dumping pg_locks data to confirm or rule 
out locking issues messing with AV processing, I can see some potential 
that it may have simplified situations I've ran into in the past.  And 
it's not out of line with the logging verbosity of similar failure mode 
tests that follow it.  Right now failure to acquire a lock is just not 
considered a log-worthy issue, and I agree that it's worth considering 
whether it should be. 

If you could gather more info on whether this logging catches the 
problem cases you're seeing, that would really be the right test for the 
patch's usefulness.  I'd give you solid 50/50 odds that you've correctly 
diagnosed the issue, and knowing for sure would make advocating for this 
logging a pretty easy sell to me at least.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Attachment: av-lock-failure-v1.diff
Description: text/x-patch (565 bytes)

In response to

Responses

pgsql-hackers by date

Next:From: David FetterDate: 2011-01-05 07:09:57
Subject: Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Previous:From: Jeff DavisDate: 2011-01-05 05:54:16
Subject: Re: WIP: Range Types

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group