Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-04 23:47:43
Message-ID: CAEepm=2dzNcdKLs=cJEzXuvNXTj-6CMou1JT9g5uzsU6dErNcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, Jun 5, 2015 at 9:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Here's a new version with some more fixes and improvements:
>
> - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset
> when the oldest offset became known if the now-known value happened to
> be zero. Fixed.
>
> - SetOffsetVacuumLimit now logs useful information at the DEBUG1
> level, so that you can see that it's doing what it's supposed to.
>
> - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the
> offsetStopLimit even if it can't truncate anything. This seems
> useless, but it's not, because it may be that the last checkpoint
> advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real
> value, and now we can actually set offsetStopLimit properly.
>
> - TruncateMultiXact no longer calls find_multixact_start when there
> are no remaining multixacts. This is actually a completely separate
> bug that goes all the way back to 9.3.0 and can potentially cause
> TruncateMultiXact to remove every file in pg_multixact/offsets.
> Restarting the cluster becomes impossible because TrimMultiXact barfs.
>
> - TruncateMultiXact now logs a message if the oldest multixact does
> not precede the earliest one on disk and is not equal to the next
> multixact and yet does not exist. The value of the log message is
> that it discovered the bug mentioned in the previous line, so I think
> it's earning its keep.
>
> With this version, I'm able to see that when you start up a
> 9.3.latest+this patch with a cluster that has a bogus value of 1 in
> relminmxid, datminmxid, and the control file, autovacuum vacuums
> everything in sight, all the values get set back to the right thing,
> and the next checkpoint enables the member-wraparound guards. This
> works with both autovacuum=on and autovacuum=off; the emergency
> mechanism kicks in as intended. We'll want to warn people with big
> databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may
> want to pre-vacuum those tables before upgrading to avoid a vacuum
> storm. But generally I'm pretty happy with this: forcing those values
> to get fixed so that we can guard against member-space wraparound
> seems like the right thing to do.
>
> So, to summarize, this patch does the following:
>
> - Fixes the failure-to-start problems introduced in 9.4.2 in
> complicated pg_upgrade scenarios.
> - Prevents the new calls to find_multixact_start we added in 9.4.2
> from happening during recovery, where they can only create failure
> scenarios. The call in TruncateMultiXact that has been there all
> along is not eliminated, but now handles failure more gracefully.
> - Fixes possible incorrect removal of every single
> pg_multixact/offsets file when no multixacts exist; one file should be
> kept.
> - Forces aggressive autovacuuming when the control file's
> oldestMultiXid doesn't point to a valid MultiXact and enables member
> wraparound at the next checkpoint following the correction of that
> problem.

With this patch, when I run the script
"checkpoint-segment-boundary.sh" from
http://www.postgresql.org/message-id/CAEepm=1_KbHGbmPVmkUGE5qTP+B4efoCJYS0unGo-Mc5NV=UDg@mail.gmail.com
I see the following during shutdown checkpoint:

LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound

That message comes from SimpleLruTruncate.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shuwn Yuan Tee 2015-06-05 00:11:46 Re: replicating many to one
Previous Message Robert Haas 2015-06-04 21:35:00 Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-06-04 23:53:12 Re: [CORE] Restore-reliability mode
Previous Message Robert Haas 2015-06-04 21:59:55 Re: [PATCH] Fix documentation bug in how to calculate the quasi-unique pg_log session_id