Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-05 18:53:31
Message-ID: 20150605185331.GW133018@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Robert Haas wrote:
> On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote:
> >> Here's a new version with some more fixes and improvements:
> >
> > I read through this version and found nothing to change. I encourage other
> > hackers to study the patch, though. The surrounding code is challenging.
>
> Andres tested this and discovered that my changes to
> find_multixact_start() were far more creative than intended.
> Committed and back-patched with a trivial fix for that stupidity and a
> novel-length explanation of the changes.

I think novel-length is fine. The bug itself is pretty complicated, and
so is the solution. Many thanks for working through this.

FWIW I tested with the (attached) reproducer script(*) for my customer's
problem, and it works fine now where it failed before. One thing which
surprised me a bit, but in hindsight should have been pretty obvious, is
that the "multixact member protections are fully armed" message is only
printed once the standby gets out of recovery, instead of when it
reaches consistent state or some such earlier point.

(*) Actually the script cheats to get past an issue, which I couldn't
actually figure out, that a file can't be "seeked"; I just do a "touch"
to create an empty file there, which causes the same error situation as
on my customer's log.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
repro-chkpt-replay-failure.sh application/x-sh 1.9 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2015-06-05 18:55:25 Re: alter column type
Previous Message Andres Freund 2015-06-05 18:47:33 Re: [HACKERS] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-06-05 18:57:10 Re: [CORE] Restore-reliability mode
Previous Message Heikki Linnakangas 2015-06-05 18:49:45 Re: gcc -ansi versus SSE4.2 detection