Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-19 22:37:04
Message-ID: CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

Linux 2.6.32
PostgreSQL 9.2.6

It was suffering from this problem on 9.2.4, mostly last couple of
weeks when I had to rebuild the replica almost every 3 days, and I
hoped it would be fixed in 9.2.6, but it is not.

Yesterday I upgraded both master and replica servers to 9.2.6,
performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild
the replica from scratch.

Below are replica logs, pg_controldata output, configuration and the
backtrace of the core dump. I can keep the replica untouched (without
rebuilding it) for several more hours, so please let me know if you
need me to show you something else.

Replica logs:

2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
page 14833 of relation base/16436/3321003988 is uninitialized
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
contains references to invalid pages
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup
process (PID 19938) was terminated by signal 6: Aborted
2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG:
terminating any other active server processes

pg_controldata output:

pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 5858109675396804534
Database cluster state: in archive recovery
pg_control last modified: Thu 19 Dec 2013 08:37:31 PM MSK
Latest checkpoint location: 3271/79F16848
Prior checkpoint location: 3270/77E32598
Latest checkpoint's REDO location: 3270/929671E0
Latest checkpoint's TimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 4/1572485244
Latest checkpoint's NextOID: 3349674903
Latest checkpoint's NextMultiXactId: 6730476
Latest checkpoint's NextMultiOffset: 30866630
Latest checkpoint's oldestXID: 1340099111
Latest checkpoint's oldestXID's DB: 16436
Latest checkpoint's oldestActiveXID: 1572038146
Time of latest checkpoint: Thu 19 Dec 2013 07:00:09 PM MSK
Minimum recovery ending location: 3272/666EE368
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 550
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

Non default configuration:

application_name = psql
archive_command = (disabled)
autovacuum_analyze_scale_factor = 0.05
autovacuum_max_workers = 5
autovacuum_naptime = 5
autovacuum_vacuum_cost_delay = 5
autovacuum_vacuum_scale_factor = 0.05
bgwriter_delay = 10
bgwriter_lru_multiplier = 10
checkpoint_completion_target = 0.9
checkpoint_segments = 1024
checkpoint_timeout = 3600
checkpoint_warning = 300
client_encoding = UTF8
DateStyle = ISO, DMY
default_text_search_config = pg_catalog.russian
effective_cache_size = 17432576
effective_io_concurrency = 32
hot_standby = on
hot_standby_feedback = on
lc_collate = ru_RU.UTF-8
lc_ctype = ru_RU.UTF-8
lc_messages = en_US.UTF-8
lc_monetary = ru_RU.UTF-8
lc_numeric = ru_RU.UTF-8
lc_time = ru_RU.UTF-8
listen_addresses = *
log_checkpoints = on
log_directory = /var/log/pgsql
log_file_mode = 0600
log_filename = postgresql-%Y-%m-%d.log
log_line_prefix = %t %p %u(at)%d from %h [vxid:%v txid:%x] [%i]
log_lock_waits = on
log_rotation_size = 0
log_statement = ddl
log_timezone = W-SU
log_truncate_on_rotation = on
logging_collector = on
maintenance_work_mem = 1048576
max_connections = 550
max_stack_depth = 2048
max_standby_streaming_delay = 300000
max_wal_senders = 3
pg_stat_statements.track = all
port = 6432
server_encoding = UTF8
shared_buffers = 4587520
shared_preload_libraries = pg_stat_statements
synchronous_commit = off
TimeZone = W-SU
track_activity_query_size = 4096
transaction_isolation = read committed
unix_socket_permissions = 0777
wal_buffers = 2048
wal_keep_segments = 2048
wal_level = hot_standby
work_mem = 262144

Backtrace of the coredump:

Core was generated by `postgres: startup process recovering
000000020000327200000066 '.
Program terminated with signal 6, Aborted.
#0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
postgresql92-server-9.2.6-1PGDG.el6.x86_64
(gdb) set print pretty on
(gdb) bt f
#0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f6a5a883085 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x000000000071c580 in errfinish (dummy=<value optimized out>) at elog.c:534
edata = 0xb88fc0
elevel = 22
oldcontext = 0x134a740
econtext = 0x0
__func__ = "errfinish"
#3 0x000000000071d09c in elog_finish (elevel=<value optimized out>,
fmt=<value optimized out>) at elog.c:1213
edata = 0xb88fc0
oldcontext = 0x134a740
__func__ = "elog_finish"
#4 0x00000000004a6d6d in log_invalid_page (node=...,
forkno=MAIN_FORKNUM, blkno=14833, present=1 '\001') at xlogutils.c:91
key = {
node = {
spcNode = 1663,
dbNode = 16436,
relNode = 3321003988
},
forkno = 25867822,
blkno = 0
}
hentry = <value optimized out>
found = <value optimized out>
__func__ = "log_invalid_page"
#5 0x00000000004a6ea0 in XLogReadBufferExtended (rnode=...,
forknum=MAIN_FORKNUM, blkno=14833, mode=RBM_NORMAL) at xlogutils.c:358
page = <value optimized out>
lastblock = 38539
buffer = <value optimized out>
smgr = <value optimized out>
#6 0x00000000004897a2 in btree_xlog_vacuum (lsn=<value optimized
out>, record=0x7f6a4fb20010) at nbtxlog.c:510
blkno = 14833
xlrec = 0x7f6a4fb20030
buffer = <value optimized out>
page = <value optimized out>
opaque = <value optimized out>
#7 btree_redo (lsn=<value optimized out>, record=0x7f6a4fb20010) at
nbtxlog.c:1062
info = <value optimized out>
__func__ = "btree_redo"
#8 0x00000000004a3468 in StartupXLOG () at xlog.c:6931
recoveryContinue = 1 '\001'
recoveryApply = <value optimized out>
errcontext = {
previous = 0x0,
callback = 0x49d5a0 <rm_redo_error_callback>,
arg = 0x7f6a4fb20010
}
xtime = <value optimized out>
rmid = <value optimized out>
xlogctl = 0x7f6a6de08480
Insert = <value optimized out>
checkPoint = {
redo = {
xlogid = 12896,
xrecoff = 4261478784
},
ThisTimeLineID = 2,
fullPageWrites = 1 '\001',
nextXidEpoch = 4,
nextXid = 1452297001,
nextOid = 3344849815,
nextMulti = 6695025,
nextMultiOffset = 30699263,
oldestXid = 1256929431,
oldestXidDB = 12865,
time = 1387392623,
oldestActiveXid = 1452282892
}
wasShutdown = <value optimized out>
reachedStopPoint = 0 '\000'
haveBackupLabel = <value optimized out>
RecPtr = <value optimized out>
checkPointLoc = {
xlogid = 12896,
xrecoff = 4271100680
}
EndOfLog = <value optimized out>
endLogId = <value optimized out>
endLogSeg = <value optimized out>
record = 0x7f6a4fb20010
freespace = <value optimized out>
oldestActiveXID = <value optimized out>
backupEndRequired = <value optimized out>
backupFromStandby = <value optimized out>
dbstate_at_startup = <value optimized out>
__func__ = "StartupXLOG"
#9 0x00000000006153b8 in StartupProcessMain () at startup.c:220
__func__ = "StartupProcessMain"
#10 0x00000000004c3837 in AuxiliaryProcessMain (argc=2,
argv=0x7fffa02b0100) at bootstrap.c:418
progname = 0x854a1a "postgres"
flag = <value optimized out>
userDoption = 0x0
__func__ = "AuxiliaryProcessMain"
#11 0x00000000006105f3 in StartChildProcess (type=StartupProcess) at
postmaster.c:4522
pid = <value optimized out>
av = {0x854a1a "postgres", 0x7fffa02b0150 "-x2", 0x0,
0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H\003C\b1\322\061\311H\213\063H=\377ɚ;v\033\017\037\204",
0x7fffa02b0170 "\300\226\064\001", 0x0, 0x7fffa02b0150 "-x2",
0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H\003C\b1\322\061\311H\213\063H=\377ɚ;v\033\017\037\204",
0xb41080 ">-\262R",
0x0}
ac = 2
typebuf =
"-x2\000\377\177\000\000\274\b/\240\377\177\000\000\005\000\000\000\000\000\000\000\005\000\000\000\000\000\000"
__func__ = "StartChildProcess"
#12 0x0000000000614c3c in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1123
opt = <value optimized out>
status = <value optimized out>
userDoption = <value optimized out>
listen_addr_saved = 1 '\001'
i = <value optimized out>
__func__ = "PostmasterMain"
#13 0x00000000005b37e0 in main (argc=5, argv=0x13496a0) at main.c:199
No locals.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2013-12-19 23:04:07 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Previous Message Alvaro Herrera 2013-12-19 15:58:07 Re: Update with subselect sometimes returns wrong result

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Flower 2013-12-19 22:52:10 Re: gaussian distribution pgbench
Previous Message Joseph Kregloh 2013-12-19 21:55:34 Re: pg_upgrade & tablespaces