Re: Assertion failure at standby promotion

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure at standby promotion
Date: 2013-05-05 09:13:47
Message-ID: CA+HiwqFOrBT2iURbzrUYA6bWA4-hs39H7qwHtFuz5TBKBwb8qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I tried reproducing the scenario. Note that I did not archive xlogs
(that is, archive_command = '/bin/true' and corresponding
restore_command = '/bin/false'). I performed the steps you mentioned
and could find following:

***** Log on standby-1:
[Standby-1]LOG: database system was interrupted; last known up at
2013-05-05 14:05:08 IST
[Standby-1]LOG: creating missing WAL directory "pg_xlog/archive_status"
[Standby-1]LOG: entering standby mode
[Standby-1]LOG: started streaming WAL from primary at 0/2000000 on timeline 1
[Standby-1]LOG: redo starts at 0/2000024
[Standby-1]LOG: consistent recovery state reached at 0/20000DC
[Standby-1]LOG: database system is ready to accept read only connections
[Standby-1]LOG: received promote request
[Standby-1]FATAL: terminating walreceiver process due to administrator command
[Standby-1]LOG: invalid magic number 0000 in log segment
000000010000000000000003, offset 5316608
[Standby-1]LOG: redo done at 0/350F0B8
[Standby-1]LOG: last completed transaction was at log time 2013-05-05
14:05:14.571492+05:30
[Standby-1]LOG: selected new timeline ID: 2
[Standby-1]LOG: archive recovery complete
>> [Standby-1]ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000
[Standby-1]LOG: database system is ready to accept connections
[Standby-1]LOG: autovacuum launcher started

****** Log on Standby-2:
[Standby-2]LOG: database system was interrupted while in recovery at
log time 2013-05-05 14:05:07 IST
[Standby-2]HINT: If this has occurred more than once some data might
be corrupted and you might need to choose an earlier recovery target.
[Standby-2]LOG: creating missing WAL directory "pg_xlog/archive_status"
[Standby-2]LOG: entering standby mode
[Standby-2]LOG: started streaming WAL from primary at 0/2000000 on timeline 1
[Standby-2]LOG: redo starts at 0/2000024
[Standby-2]LOG: consistent recovery state reached at 0/3000000
[Standby-2]LOG: database system is ready to accept read only connections
>> [Standby-2]FATAL: could not receive data from WAL stream: ERROR: server switched off timeline 1 at 0/3510B14, but walsender already streamed up to 0/3512000

[Standby-2]LOG: invalid magic number 0000 in log segment
000000010000000000000003, offset 5316608
[Standby-2]LOG: fetching timeline history file for timeline 2 from
primary server
[Standby-2]LOG: started streaming WAL from primary at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
[Standby-2]LOG: restarted WAL streaming at 0/3000000 on timeline 1
[Standby-2]LOG: replication terminated by primary server
[Standby-2]DETAIL: End of WAL reached on timeline 1 at 0/3510B14
...
...
...

****** Also, in the ps output, following is the state of wal sender
(standby-1) and wal receiver (standby-2)

amit 8084 5675 0 14:13 ? 00:00:00 postgres: wal receiver
process restarting at 0/3000000
amit 8085 5648 0 14:13 ? 00:00:00 postgres: wal sender
process amit [local] idle

Is this related to the assertion failure that you have reported?

--

Amit Langote

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-05-05 14:20:15 Re: Remaining beta blockers
Previous Message soroosh sardari 2013-05-05 07:33:11 Meaning of keyword category list in src/backend/parser/gram.y