Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-05-29 02:41:21
Message-ID: 20150529024121.GL5885@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Robert Haas wrote:

> 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid
> values which are equal to the next-mxid counter instead of the correct
> value; in other words, they are too new.

What you describe is what happens if you upgrade from 9.2 or earlier.
For this case we use this call:

exec_prog(UTILITY_LOG_FILE, NULL, true,
"\"%s/pg_resetxlog\" -m %u,%u \"%s\"",
new_cluster.bindir,
old_cluster.controldata.chkpnt_nxtmulti + 1,
old_cluster.controldata.chkpnt_nxtmulti,
new_cluster.pgdata);

This uses the old cluster's nextMulti value as oldestMulti in the new
cluster, and that value+1 is used as nextMulti. This is correct: we
don't want to preserve any of the multixact state from the previous
cluster; anything before that value can be truncated with no loss of
critical data. In fact, there is no critical data before that value at
all.

If you upgrade from 9.3, this other call is used instead:

/*
* we preserve all files and contents, so we must preserve both "next"
* counters here and the oldest multi present on system.
*/
exec_prog(UTILITY_LOG_FILE, NULL, true,
"\"%s/pg_resetxlog\" -O %u -m %u,%u \"%s\"",
new_cluster.bindir,
old_cluster.controldata.chkpnt_nxtmxoff,
old_cluster.controldata.chkpnt_nxtmulti,
old_cluster.controldata.chkpnt_oldstMulti,
new_cluster.pgdata);

In this case we use the oldestMulti from the old cluster as oldestMulti
in the new cluster, which is also correct.

> A. Most obviously, we should fix pg_upgrade so that it installs
> chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so
> that we stop creating new instances of this problem. That won't get
> us out of the hole we've dug for ourselves, but we can at least try to
> stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the
> wrong thing - anyone want to double-check me on that one?)

I don't think there's anything that we need to fix here.

> B. We need to change find_multixact_start() to fail softly. This is
> important because it's legitimate for it to fail in recovery, as
> discussed upthread, and also because we probably want to eliminate the
> fail-to-start hazard introduced in 9.4.2 and 9.3.7.
> find_multixact_start() is used in three places, and they each require
> separate handling:
>
> - In SetMultiXactIdLimit, find_multixact_start() is used to set
> MultiXactState->oldestOffset, which is used to determine how
> aggressively to vacuum. If find_multixact_start() fails, we don't
> know how aggressively we need to vacuum to prevent members wraparound;
> it's probably best to decide to vacuum as aggressively as possible.
> Of course, if we're in recovery, we won't vacuum either way; the fact
> that it fails softly is good enough.

Sounds good.

> - In DetermineSafeOldestOffset, find_multixact_start() is used to set
> MultiXactState->offsetStopLimit. If it fails here, we don't know when
> to refuse multixact creation to prevent wraparound. Again, in
> recovery, that's fine. If it happens in normal running, it's not
> clear what to do. Refusing multixact creation is an awfully blunt
> instrument. Maybe we can scan pg_multixact/offsets to determine a
> workable stop limit: the first file greater than the current file that
> exists, minus two segments, is a good stop point. Perhaps we ought to
> use this mechanism here categorically, not just when
> find_multixact_start() fails. It might be more robust than what we
> have now.

Blunt instruments have the desirable property of being simple. We don't
want any more clockwork here, I think --- this stuff is pretty
complicated already. As far as I understand, if during normal running
we see that find_multixact_start has failed, sufficient vacuuming should
get it straight eventually with no loss of data.

> - In TruncateMultiXact, find_multixact_start() is used to set the
> truncation point for the members SLRU. If it fails here, I'm guessing
> the right solution is not to truncate anything - instead, rely on
> intense vacuuming to eventually advance oldestMXact to a value whose
> member data still exists; truncate then.

Fine.

> C. I think we should also change TruncateMultiXact() to truncate
> offsets first, and then members. As things stand, if we truncate
> members first, we increase the risk of seeing an offset that will fail
> when passed to find_multixact_start(), because TruncateMultiXact()
> might get interrupted before it finishes. That seem like an
> unnecessary risk.

Not sure about this point. We did it the way you propose previously,
and found it to be a problem because sometimes we tried to read an
offset file that was no longer there. Do we really read member files
anywhere? I thought we only tried to read offset files. If we remove
member files, what is it that we try to read and find not to be present?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2015-05-29 03:12:26 Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Adrian Klaver 2015-05-29 02:01:56 Re: Python 3.2 XP64 and Numpy...

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-05-29 03:10:04 Re: RFC: Remove contrib entirely
Previous Message Amit Langote 2015-05-29 02:29:38 Re: RFC: Remove contrib entirely