9.4 pg_control corruption

From: Steve Singer <steve(at)ssinger(dot)info>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: 9.4 pg_control corruption
Date: 2014-07-09 01:41:00
Message-ID: BLU436-SMTP2539162CBA275312AE14DFADC0F0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've encountered a corrupt pg_control file on my 9.4 development
cluster. I've mostly been using the cluster for changeset extraction /
slony testing.

This is a 9.4 (currently commit 6ad903d70a440e + a walsender change
discussed in another thread) but would have had the initdb done with an
earlier 9.4 snapshot.

/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting. The results below are untrustworthy.

pg_control version number: 937
Catalog version number: 201405111
Database system identifier: 6014096177254975326
Database cluster state: in production
pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location: 5/44DC5FC8
Prior checkpoint location: 5/44C58B88
Latest checkpoint's REDO location: 5/44DC5FC8
Latest checkpoint's REDO WAL file: 000000010000000500000044
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/1558590
Latest checkpoint's NextOID: 505898
Latest checkpoint's NextMultiXactId: 3285
Latest checkpoint's NextMultiOffset: 6569
Latest checkpoint's oldestXID: 1281
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: logical
Current wal_log_hints setting: off
Current max_connections setting: 200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 65793
Date/time type storage: floating-point numbers
Float4 argument passing: by reference
Float8 argument passing: by reference
Data page checksum version: 2602751502
ssinger(at)ssinger-laptop:/usr/local/pgsql94wal/bin$

Before this postgres crashed, and seemed to have problems recovering. I
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.

test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL: the database system
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING: terminating
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT: In a moment you should
be able to reconnect to the database and repeat your command.
2014-07-08 18:15:20.483 EDTLOG: all server processes terminated;
reinitializing
2014-07-08 18:15:20.720 EDTLOG: database system was interrupted;
last known up at 2014-07-08 18:15:15 EDT
2014-07-08 18:15:20.865 EDTLOG: database system was not properly
shut down; automatic recovery in progress
2014-07-08 18:15:20.954 EDTLOG: redo starts at 5/41023848
2014-07-08 18:15:23.153 EDTLOG: unexpected pageaddr 4/D8DC6000 in
log segment 000000010000000500000044, offset 14442496
2014-07-08 18:15:23.153 EDTLOG: redo done at 5/44DC5F60
2014-07-08 18:15:23.153 EDTLOG: last completed transaction was at
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL: the database system
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL: the database system
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL: the database system
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL: the database system
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL: the database system
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL: the database system
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL: the database system
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL: the database system
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL: the database system
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.618 EDTFATAL: the database system
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.619 EDTFATAL: the database system
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.621 EDTFATAL: the database system
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.622 EDTFATAL: the database system
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.623 EDTFATAL: the database system
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.624 EDTFATAL: the database system
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.633 EDTFATAL: the database system
is in recovery mode
^C 2014-07-08 18:15:52.316 EDTLOG: received fast shutdown request

The core file in gdb shows
ore was generated by `postgres: autovacuum w'.
Program terminated with signal 6, Aborted.
#0 0x00007f18be8af295 in ?? ()
(gdb) where
#0 0x00007f18be8af295 in ?? ()
#1 0x00007f18be8b2438 in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()

I can't rule out that the hardware my laptop is misbehaving but I
haven't noticed any other problems doing non 9.4 stuff.

Has anyone seen anything similar with 9.4? Is there anything specific I
should investigate (I don't care about recovering the cluster).

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-07-09 02:14:31 Re: 9.4 pg_control corruption
Previous Message Mark Kirkwood 2014-07-09 01:10:05 Re: postgresql.auto.conf and reload