Re: BUG #11264: Auto vacuum wraparound job blocking everything

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: David Gould <daveg(at)sonic(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date: 2014-10-01 21:32:12
Message-ID: 20141001213211.GW5311@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

David Gould wrote:

> We ran into this on two instances last night and I'm starting to worry that
> many others may hit it as well. As far as I can tell any postgres instance
> that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
> will have:
>
> Latest checkpoint's oldestMultiXid: 1
>
> The workaround above is good if you still have the old cluster around from
> before the upgrade. We did not, that was all cleaned up months ago. Which
> raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
> 1 without access to the pre-upgrade instance. That is, where do you get the
> correct value of oldestMultiXid to set into pg_database.datxminmxid?
>
> I took a guess that the oldest pg_class.relminmxid for all the tables in all
> the databases would be ok and updated pg_database.datminmxid witt that. That
> is, in each database I ran:
>
> select relname, relminmxid, age(relminmxid)
> from pg_class
> where relkind = 'r'
> and age(relminmxid) < 2147483647
> order by 3 desc limit 1 ;

Unfortunately, the age() function is only valid for transaction IDs, not
multixacts. In certain cases the values returned might be sensible, but
they don't have to be, and if the counters are not relatively close to
one another, most likely they won't be sensible. A new function was
added to 9.4, but that doesn't help those in 9.3.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=36ad1a87a368d837d5c2260b2fd2ff83e976425a
(I guess you could have an extension containing that function.)
My bet is that is the reason you had to add the condition that the value
is less than 2^31 - 1.

Offhand, other than the way to obtain the multixact age, the procedure
seems sensible.

A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value.

> 3) Is this likely to affect all 9.3.x<5 upgraded databases?

Yes.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Burgess, Freddie 2014-10-02 03:50:26 Unreliable results returned with PostgreSQL Spatial query?
Previous Message eric 2014-10-01 21:26:47 BUG #11551: copy and paste issues in pgadmin SQL editor