Standby stopped working after PANIC: WAL contains references to invalid pages

From: Dan Kogan <dan(at)iqtell(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Standby stopped working after PANIC: WAL contains references to invalid pages
Date: 2013-06-22 19:43:38
Message-ID: 60B572D9298D944580F7D51195DD3080468902FFCA@VMBX125.ihostexchange.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

Today our standby instance stopped working with this error in the log:

2013-06-22 16:27:32 UTC [8367]: [247-1] [] WARNING: page 158130 of relation pg_tblspc/16447/PG_9.2_201204301/16448/39154429 is uninitialized
2013-06-22 16:27:32 UTC [8367]: [248-1] [] CONTEXT: xlog redo vacuum: rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129
2013-06-22 16:27:32 UTC [8367]: [249-1] [] PANIC: WAL contains references to invalid pages
2013-06-22 16:27:32 UTC [8367]: [250-1] [] CONTEXT: xlog redo vacuum: rel 16447/16448/39154429; blk 158134, lastBlockVacuumed 158129
2013-06-22 16:27:32 UTC [8366]: [3-1] [] LOG: startup process (PID 8367) was terminated by signal 6: Aborted
2013-06-22 16:27:32 UTC [8366]: [4-1] [] LOG: terminating any other active server processes

After re-start the same exact error occurred.

We thought that maybe we hit this bug - http://postgresql.1045698.n5.nabble.com/Completely-broken-replica-after-PANIC-WAL-contains-references-to-invalid-pages-td5750072.html.
However, there is nothing in our log about sub-transactions, so it didn't seem the same to us.

Any advice on how to further debug this so we can avoid this in the future is appreciated.

Environment:

AWS, High I/O instance (hi1.4xlarge), 60GB RAM

Software and settings:

PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 64-bit

archive_command rsync -a %p slave:/var/lib/postgresql/replication_load/%f
archive_mode on
autovacuum_freeze_max_age 1000000000
autovacuum_max_workers 6
checkpoint_completion_target 0.9
checkpoint_segments 128
checkpoint_timeout 30min
default_text_search_config pg_catalog.english
hot_standby on
lc_messages en_US.UTF-8
lc_monetary en_US.UTF-8
lc_numeric en_US.UTF-8
lc_time en_US.UTF-8
listen_addresses *
log_checkpoints on
log_destination stderr
log_line_prefix %t [%p]: [%l-1] [%h]
log_min_duration_statement -1
log_min_error_statement error
log_min_messages error
log_timezone UTC
maintenance_work_mem 1GB
max_connections 1200
max_standby_streaming_delay 90s
max_wal_senders 5
port 5432
random_page_cost 2
seq_page_cost 1
shared_buffers 4GB
ssl off
ssl_cert_file /etc/ssl/certs/ssl-cert-snakeoil.pem
ssl_key_file /etc/ssl/private/ssl-cert-snakeoil.key
synchronous_commit off
TimeZone UTC
wal_keep_segments 128
wal_level hot_standby
work_mem 8MB

root(at)ip-10-148-131-236:~# /usr/local/pgsql/bin/pg_controldata /usr/local/pgsql/data
pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 5838668587531239413
Database cluster state: in archive recovery
pg_control last modified: Sat 22 Jun 2013 06:13:07 PM UTC
Latest checkpoint location: 2250/18CA0790
Prior checkpoint location: 2250/18CA0790
Latest checkpoint's REDO location: 224F/E127B078
Latest checkpoint's TimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 1/2018629527
Latest checkpoint's NextOID: 43086248
Latest checkpoint's NextMultiXactId: 7088726
Latest checkpoint's NextMultiOffset: 20617234
Latest checkpoint's oldestXID: 1690316999
Latest checkpoint's oldestXID's DB: 16448
Latest checkpoint's oldestActiveXID: 2018629527
Time of latest checkpoint: Sat 22 Jun 2013 03:24:05 PM UTC
Minimum recovery ending location: 2251/5EA631F0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 1200
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
root(at)ip-10-148-131-236:~#

Thanks again.

Dan

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lonni J Friedman 2013-06-22 20:11:14 Re: Standby stopped working after PANIC: WAL contains references to invalid pages
Previous Message Michael Angeletti 2013-06-22 16:43:47 WAL archiving not starting at the beginning