Skip site navigation (1) Skip section navigation (2)

Re: BUG #7500: hot-standby replica crash after an initial rsync

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Subject: Re: BUG #7500: hot-standby replica crash after an initial rsync
Date: 2012-08-21 12:58:33
Message-ID: 201208211458.34085.andres@2ndquadrant.com (view raw or flat)
Thread:
Lists: pgsql-bugs
Hi,


On Tuesday, August 21, 2012 02:28:54 PM Maxim Boguk wrote:
> On Tue, Aug 21, 2012 at 8:08 PM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:
> >> > I have kept all that database files for the future investigation.
> >> > 
> >> > What I should look into first?
> >> 
> >> Could you reproduce the error with log_error_verbosity=verbose? Or even
> >> better
> >> provide a backtrace with gdb?
> > 
> > There log with log_error_verbosity=verbose:
> LOG:  00000: database system was interrupted while in recovery at log time
> 2012-08-21 08:21:42 MSK
> HINT:  If this has occurred more than once some data might be corrupted and
> you might need to choose an earlier recovery target.
> LOCATION:  StartupXLOG, xlog.c:5757
> DEBUG:  00000: standby_mode = 'on'
> LOCATION:  readRecoveryCommandFile, xlog.c:5269
> DEBUG:  00000: primary_conninfo = 'host=10.12.22.19 port=5433 user=replica
> password=acilper'
> LOCATION:  readRecoveryCommandFile, xlog.c:5276
> DEBUG:  00000: trigger_file = '/db/data/failover'
> LOCATION:  readRecoveryCommandFile, xlog.c:5283
> LOG:  00000: entering standby mode
> LOCATION:  StartupXLOG, xlog.c:5827
> DEBUG:  00000: checkpoint record is at 925/2D41CAA0
> LOCATION:  StartupXLOG, xlog.c:5894
> DEBUG:  00000: redo record is at 925/2C0004C8; shutdown FALSE
> LOCATION:  StartupXLOG, xlog.c:5929
> DEBUG:  00000: next transaction ID: 0/1768511856; next OID: 59880938
> LOCATION:  StartupXLOG, xlog.c:5933
> DEBUG:  00000: next MultiXactId: 361788644; next MultiXactOffset: 822580965
> LOCATION:  StartupXLOG, xlog.c:5936
> DEBUG:  00000: oldest unfrozen transaction ID: 1569182525, in database
> 16424 LOCATION:  StartupXLOG, xlog.c:5939
> DEBUG:  00000: transaction ID wrap limit is 3716666172, limited by database
> with OID 16424
> LOCATION:  SetTransactionIdLimit, varsup.c:334
> DEBUG:  00000: initializing for hot standby
> LOCATION:  StartupXLOG, xlog.c:6070
> DEBUG:  00000: my backend id is 1
> LOCATION:  SharedInvalBackendInit, sinvaladt.c:326
> LOG:  00000: redo starts at 925/2C0004C8
> LOCATION:  StartupXLOG, xlog.c:6199
> DEBUG:  00000: record known xact 1768508078 latestObservedXid 0
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.079655+04
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508078 latestObservedXid 0
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.079655+04
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: remove KnownAssignedXid 1768508078
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.079655+04
> LOCATION:  KnownAssignedXidsRemove, procarray.c:2869
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo exclusive_lock: xid 1768508080 rel 1663/16424/17282;
> tid 725829/8
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo update: rel 1663/16424/17282; tid 725829/8; new
> 726017/7 LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508076 latestObservedXid 0
> CONTEXT:  xlog redo exclusive_lock: xid 1768508076 rel 1663/16424/47760635;
> tid 3075/136
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/47747; tid 43866/197
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/52546872; tid 44700/276
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508061 latestObservedXid 0
> CONTEXT:  xlog redo log: rel 1663/16424/19211
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508082 latestObservedXid 0
> CONTEXT:  xlog redo exclusive_lock: xid 1768508082 rel 1663/16424/4759107;
> tid 1098/7
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508076 latestObservedXid 0
> CONTEXT:  xlog redo hot_update: rel 1663/16424/47760635; tid 3075/136; new
> 3075/140
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo exclusive_lock: xid 1768508080 rel 1663/16424/17282;
> tid 725918/2
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508082 latestObservedXid 0
> CONTEXT:  xlog redo hot_update: rel 1663/16424/4759107; tid 1098/7; new
> 1098/39
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo update: rel 1663/16424/17282; tid 725918/2; new
> 725918/6 LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/47747; tid 43866/201
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/52546872; tid 44700/276
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.082508+04
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508080 latestObservedXid 0
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.082508+04
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: remove KnownAssignedXid 1768508080
> CONTEXT:  xlog redo commit: 2012-08-21 08:21:42.082508+04
> LOCATION:  KnownAssignedXidsRemove, procarray.c:2869
> DEBUG:  00000: record known xact 1768508061 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/64654; tid 178183/44
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508061 latestObservedXid 0
> CONTEXT:  xlog redo insert: rel 1663/16424/95419; tid 38650/318
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> DEBUG:  00000: record known xact 1768508061 latestObservedXid 0
> 
> 
> CONTEXT:  xlog redo create multixact 361788644 offset 822580965: 1768508061
> 1768508078
> LOCATION:  RecordKnownAssignedTransactionIds, procarray.c:2375
> FATAL:  XX000: could not access status of transaction 361788644
> DETAIL:  Could not read from file "pg_multixact/offsets/1590" at offset
> 114688: Unknown error: 0.
> CONTEXT:  xlog redo create multixact 361788644 offset 822580965: 1768508061
> 1768508078
> LOCATION:  SlruReportIOError, slru.c:840
The strange thing here is the "unknown error: 0" bit. That indicates that 
slru_errno isn't properly set...

> 
> I have some problems with debug startup process with gdb...
> I following next sequence of commands (and got no useful results):
Youre debugging the postmaster that way. The easiest way would be to just 
attach to the startup process with gdb -p. Not sure if you can manage that 
timingwise without manually adding sleeps.

Otherwise you probably will have to work with "set follow-fork-mode ask" in 
gdb. Getting to the right child isn't easy though.

> [pgsql(at)db1 ~/tmp/postgresql-9.0.8]$ /usr/local/bin/gdb
> /usr/local/bin/postgres
> ...
> Reading symbols from /usr/local/bin/postgres...done.
> (gdb) b exit
> Breakpoint 1 at 0x44d38c
> (gdb) run -D /db/data_back/
> Starting program: /usr/local/bin/postgres -D /db/data_back/
> 
> Program received signal SIGUSR1, User defined signal 1.
> 0x00000008018ddadc in select () from /lib/libc.so.7
> (gdb) bt
> #0  0x00000008018ddadc in select () from /lib/libc.so.7
> #1  0x000000000065641c in ServerLoop () at postmaster.c:1396
> #2  0x0000000000655d05 in PostmasterMain (argc=3, argv=0x7fffffffead0) at
> postmaster.c:1097
> #3  0x00000000005d974a in main (argc=3, argv=0x7fffffffead0) at main.c:188
> (gdb) cont
> Continuing.
handle SIGUSR1 nohandle noprint
is useful to avoid this btw.

> Breakpoint 1, 0x000000080186bab0 in exit () from /lib/libc.so.7
> (gdb) bt
> #0  0x000000080186bab0 in exit () from /lib/libc.so.7
> #1  0x000000000067d0a6 in proc_exit (code=1) at ipc.c:137
> #2  0x0000000000659933 in ExitPostmaster (status=1) at postmaster.c:4172
> #3  0x0000000000657d2d in reaper (postgres_signal_arg=20) at
> postmaster.c:2354
> #4  <signal handler called>
> #5  <signal handler called>
> #6  0x00000008018ddadc in select () from /lib/libc.so.7
> #7  0x000000000065641c in ServerLoop () at postmaster.c:1396
> #8  0x0000000000655d05 in PostmasterMain (argc=3, argv=0x7fffffffead0) at
> postmaster.c:1097
> #9  0x00000000005d974a in main (argc=3, argv=0x7fffffffead0) at main.c:188
> (gdb) cont
> Continuing.
> [Inferior 1 (process 98857) exited with code 01]
> (gdb) quit

Andres
-- 
 Andres Freund	                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


In response to

Responses

pgsql-bugs by date

Next:From: Maxim BogukDate: 2012-08-21 13:30:44
Subject: Re: BUG #7500: hot-standby replica crash after an initial rsync
Previous:From: Maxim BogukDate: 2012-08-21 12:28:54
Subject: Re: BUG #7500: hot-standby replica crash after an initial rsync

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group