Need help debugging why autovacuum seems "stuck" -- until I use superuser to vacuum freeze pg_database

From: "McCoy, Shawn" <shamccoy(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Need help debugging why autovacuum seems "stuck" -- until I use superuser to vacuum freeze pg_database
Date: 2016-05-02 02:39:02
Message-ID: A9D40BB7-CFD6-46AF-A0A1-249F04878A2A@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been debugging a problem on a 9.3.10 Postgres database cluster with over 1200 databases. 10 workers, increased maintenance_work_mem, auto vacuum settings to run more frequently than default. What I will notice is that autovacuum will run for a week or so and traverse databases as expected. I will be able to see that age(datfrozenxid) for all 1200 databases will stay close to autovacuum_freeze_max_age as desired.

Then, suddenly I will see it get “stuck”. Autovacuum launcher will not launch worker processes even though databases start to age past autovacuum_freeze_max_age. If I create a list of databases and sort by age(datfrozenxid), connect to the database with the oldest and execute a simple: "vacuum freeze pg_database;”, autovacuum springs back into action.

It’s never the same database where autovacuum seems to get “stuck”. I’m attempting to gather more debugging information, but, also can’t understand why simply doing a “vacuum freeze pg_database” breaks up the jam.

Any thoughts?

Shawn

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-05-02 04:54:03 Re: snapshot too old, configured by time
Previous Message Tom Lane 2016-05-02 02:00:26 Re: About subxact and xact nesting level...