Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-03 03:42:55
Message-ID: 20150603034255.GJ2988@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Thomas Munro wrote:
> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > My guess is that the file existed, and perhaps had one or more pages,
> > but the wanted page doesn't exist, so we tried to read but got 0 bytes
> > back. read() returns 0 in this case but doesn't set errno.
> >
> > I didn't find a way to set things so that the file exists but is of
> > shorter contents than oldestMulti by the time the checkpoint record is
> > replayed.
>
> I'm just starting to learn about the recovery machinery, so forgive me
> if I'm missing something basic here, but I just don't get this. As I
> understand it, offsets/0046 should either have been copied with that
> page present in it if it existed before the backup started (apparently
> not in this case), or extended to contain it by WAL records that come
> after the backup label but before the checkpoint record that
> references it (also apparently not in this case).

Exactly --- that's the spot at which I am, also. I have had this
spinning in my head for three days now, and tried every single variation
that I could think of, but like you I was unable to reproduce the issue.
However, our customer took a second base backup and it failed in exactly
the same way, module some changes to the counters (the file that
didn't exist was 004B rather than 0046). I'm still at a loss at what
the failure mode is. We must be missing some crucial detail ...

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Stern 2015-06-03 06:50:18 Re: Database designpattern - product feature
Previous Message Bill Moran 2015-06-03 03:30:49 Re: Planner cost adjustments

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-06-03 03:55:13 Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file
Previous Message Fujii Masao 2015-06-03 03:13:39 Re: why does txid_current() assign new transaction-id?