From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Alexey Klyukin <alexk(at)hintbits(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4 |
Date: | 2014-06-04 15:10:45 |
Message-ID: | 20140604151045.GD10482@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2014-06-04 16:59:10 +0200, Alexey Klyukin wrote:
> I've recently discovered a peculiar problem on one of our big databases
> (more than 1TB). The database has been upgraded from 9.2 to 9.3.4 (using
> hardlinks to speedup the process) on April 7th around 22:00 local time.
> When doing vacuum on any table, the system fails with the following error:
>
> ERROR: could not access status of transaction 2035987419
> DETAIL: Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
>
> The erorr doesn't depend on the table being vacuumed, or even database, i.e:
>
> postgres=# create database x;
> CREATE DATABASE
> postgres=# \c x
> You are now connected to database "x" as user "postgres".
> x=# create table test();
> CREATE TABLE
> x=# vacuum test;
> ERROR: could not access status of transaction 2035987419
> DETAIL: Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
>
> The content of pg_multixact/offsets is:
>
> pg_multixact$ ls -lR
> ./members:
> -rw------- 1 postgres postgres 8192 Apr 16 18:20 0000
> ./offsets:
> -rw------- 1 postgres postgres 8192 Apr 7 22:51 0000
> -rw------- 1 postgres postgres 262144 Apr 16 18:20 79A6
>
> the select mutlixact from pg_database gives me:
> and the 2035987419 = 0x795AB3DB belongs to 795A segment.
> The 0000 file just contains a single page of all zeroes. Neither the 9.3.4
> replica of this database, nor the original 9.2 cluster data directory
> contain this file.
Looks like you're hitting the issue described in
http://archives.postgresql.org/message-id/20140530121631.GE25431%40alap3.anarazel.de
> I'm tempted to just remove the 0000 file from master and restart the
> database, since it's effectively impossible to run vacuum now, but I'd like
> to understand what's happening first. Below is the result of pg_filedump
> for the master:
Yes, that's fine in this specific case.
Note that the 0000 segment isn't yused by anything between the oldest
and newest multixact:
> Latest checkpoint's NextMultiXactId: 2040987419
> Latest checkpoint's NextMultiOffset: 3
> Latest checkpoint's oldestXID: 1038291920
> Latest checkpoint's oldestXID's DB: 16415
> Latest checkpoint's oldestActiveXID: 1655189767
> Latest checkpoint's oldestMultiXid: 2040987417
> Latest checkpoint's oldestMulti's DB: 0
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2014-06-04 15:10:52 | Re: pg_basebackup failed to back up large file |
Previous Message | Tom Lane | 2014-06-04 15:08:42 | Re: pg_control is missing a field for LOBLKSIZE |