Re: BUG #11264: Auto vacuum wraparound job blocking everything

From: David Gould <daveg(at)sonic(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date: 2014-09-30 00:44:11
Message-ID: loom.20140930T022539-927@post.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre <at> 2ndquadrant.com> writes:

> Well, yes, 9.3.4 had a bug fixed by this commit:
>
> Author: Bruce Momjian <bruce <at> momjian.us>
> Branch: master [a61daa14d] 2014-07-02 15:29:38 -0400
> Branch: REL9_4_STABLE [b446a384b] 2014-07-02 15:29:38 -0400
> Branch: REL9_3_STABLE Release: REL9_3_5 [3d2e18510] 2014-07-02 15:29:38
-0400
>
> pg_upgrade: preserve database and relation minmxid values
>
> Also set these values for pre-9.3 old clusters that don't have values
to
> preserve.
>
> Analysis by Alvaro
>
> Backpatch through 9.3
>
> > How do we fix the current issue with this one server? Is there an easy
fix?
> > Thanks.
>
> As far as I am aware, you should
> UPDATE pg_database SET datminmxid=20783
>
> and that should fix it. The oldestMulti value in pg_control would get
> updated by itself some time later. If you experience stalls before
> oldestMulti fixes itself, you could stop the server (cleanly!) and then
> pg_resetxlog -m x,y where x is the correct nextMulti value from
> pg_controldata and y is 20783.

We ran into this on two instances last night and I'm starting to worry that
many others may hit it as well. As far as I can tell any postgres instance
that was upgraded to 9.3 by a pre-9.3.5 pg_upgrade is at risk as they all
will have:

Latest checkpoint's oldestMultiXid: 1

The workaround above is good if you still have the old cluster around from
before the upgrade. We did not, that was all cleaned up months ago. Which
raises the question: how do you fix a 9.3 instance that has oldestMultiXid =
1 without access to the pre-upgrade instance. That is, where do you get the
correct value of oldestMultiXid to set into pg_database.datxminmxid?

I took a guess that the oldest pg_class.relminmxid for all the tables in all
the databases would be ok and updated pg_database.datminmxid witt that. That
is, in each database I ran:

select relname, relminmxid, age(relminmxid)
from pg_class
where relkind = 'r'
and age(relminmxid) < 2147483647
order by 3 desc limit 1 ;

And then used the oldest one to update pg_database.datminmxid. After a
checkpoint and shutdown/restart I see that it was written to pg_controldata
too. Afterwards I was able to run a vacuum freeze on the problem table.

Questions:

1) Is the procedure above safe and effective for this, or did I just hose my
databases?

2) If the procedure above is not safe, what do we do instead?

3) Is this likely to affect all 9.3.x<5 upgraded databases? If so, how do we
fix it before too much damage happens in the wider world?

-dg

--
David Gould daveg <at> sonic.net
If simplicity worked, the world would be overrun with insects.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2014-09-30 05:15:30 Re: BUG #11518: full_page_writes is off after executing pg_start_backup()
Previous Message thangalin 2014-09-29 21:21:06 BUG #11520: Cannot uncheck to set false