Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alexey Klyukin <alexk(at)hintbits(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4
Date: 2014-06-04 15:10:45
Message-ID: 20140604151045.GD10482@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-06-04 16:59:10 +0200, Alexey Klyukin wrote:
> I've recently discovered a peculiar problem on one of our big databases
> (more than 1TB). The database has been upgraded from 9.2 to 9.3.4 (using
> hardlinks to speedup the process) on April 7th around 22:00 local time.
> When doing vacuum on any table, the system fails with the following error:
>
> ERROR: could not access status of transaction 2035987419
> DETAIL: Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
>
> The erorr doesn't depend on the table being vacuumed, or even database, i.e:
>
> postgres=# create database x;
> CREATE DATABASE
> postgres=# \c x
> You are now connected to database "x" as user "postgres".
> x=# create table test();
> CREATE TABLE
> x=# vacuum test;
> ERROR: could not access status of transaction 2035987419
> DETAIL: Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
>
> The content of pg_multixact/offsets is:
>
> pg_multixact$ ls -lR
> ./members:
> -rw------- 1 postgres postgres 8192 Apr 16 18:20 0000
> ./offsets:
> -rw------- 1 postgres postgres 8192 Apr 7 22:51 0000
> -rw------- 1 postgres postgres 262144 Apr 16 18:20 79A6
>
> the select mutlixact from pg_database gives me:

> and the 2035987419 = 0x795AB3DB belongs to 795A segment.
> The 0000 file just contains a single page of all zeroes. Neither the 9.3.4
> replica of this database, nor the original 9.2 cluster data directory
> contain this file.

Looks like you're hitting the issue described in
http://archives.postgresql.org/message-id/20140530121631.GE25431%40alap3.anarazel.de

> I'm tempted to just remove the 0000 file from master and restart the
> database, since it's effectively impossible to run vacuum now, but I'd like
> to understand what's happening first. Below is the result of pg_filedump
> for the master:

Yes, that's fine in this specific case.

Note that the 0000 segment isn't yused by anything between the oldest
and newest multixact:

> Latest checkpoint's NextMultiXactId: 2040987419
> Latest checkpoint's NextMultiOffset: 3
> Latest checkpoint's oldestXID: 1038291920
> Latest checkpoint's oldestXID's DB: 16415
> Latest checkpoint's oldestActiveXID: 1655189767
> Latest checkpoint's oldestMultiXid: 2040987417
> Latest checkpoint's oldestMulti's DB: 0

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-06-04 15:10:52 Re: pg_basebackup failed to back up large file
Previous Message Tom Lane 2014-06-04 15:08:42 Re: pg_control is missing a field for LOBLKSIZE