Re: Strange replication problem - segment restored from archive but still requested from master

From: Venkata Balaji N <nag1010(at)gmail(dot)com>
To: Piotr Gasidło <quaker(at)barbara(dot)eu(dot)org>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Strange replication problem - segment restored from archive but still requested from master
Date: 2015-05-22 00:16:27
Message-ID: CAEyp7J-Q1p464XE=u4400TGG62N+Gfh6p-gPwA_a_rCuUZe0+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, May 22, 2015 at 6:52 AM, Piotr Gasidło <quaker(at)barbara(dot)eu(dot)org>
wrote:

> Got strange problem. Unable to repeat, but got logs.
>
> Simple master-slave using streaming replication.
> Master is running. Slave is down.
> Segment 0000000400004C4D00000090 was successfully archived and send
> from master to slave.
>
> Now I've started slave, and:
>
> ay 21 21:23:37 d8 postgres[50645]: [3-1] 2015-05-21 21:23:37.033 CEST
> @ 50645 LOG: database system was shut down in recovery at
> 2015-05-21 21:22:03 CEST
> May 21 21:23:37 d8 postgres[50645]: [4-1] 2015-05-21 21:23:37.034 CEST
> @ 50645 LOG: entering standby mode
> May 21 21:23:37 d8 postgres[50645]: [5-1] 2015-05-21 21:23:37.058 CEST
> @ 50645 LOG: restored log file "0000000400004C4D00000088" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [6-1] 2015-05-21 21:23:37.120 CEST
> @ 50645 LOG: redo starts at 4C4D/88493B50
> May 21 21:23:37 d8 postgres[50645]: [7-1] 2015-05-21 21:23:37.226 CEST
> @ 50645 LOG: restored log file "0000000400004C4D00000089" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [8-1] 2015-05-21 21:23:37.426 CEST
> @ 50645 LOG: restored log file "0000000400004C4D0000008A" from
> archive
> May 21 21:23:37 d8 postgres[50645]: [9-1] 2015-05-21 21:23:37.750 CEST
> @ 50645 LOG: restored log file "0000000400004C4D0000008B" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [10-1] 2015-05-21 21:23:38.376
> CEST @ 50645 LOG: restored log file "0000000400004C4D0000008C" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [11-1] 2015-05-21 21:23:38.690
> CEST @ 50645 LOG: restored log file "0000000400004C4D0000008D" from
> archive
> May 21 21:23:38 d8 postgres[50645]: [12-1] 2015-05-21 21:23:38.855
> CEST @ 50645 LOG: restored log file "0000000400004C4D0000008E" from
> archive
> May 21 21:23:39 d8 postgres[50645]: [13-1] 2015-05-21 21:23:39.275
> CEST @ 50645 LOG: restored log file "0000000400004C4D0000008F" from
> archive
> May 21 21:23:39 d8 postgres[50645]: [14-1] 2015-05-21 21:23:39.654
> CEST @ 50645 LOG: restored log file "0000000400004C4D00000090" from
> archive
> May 21 21:23:40 d8 postgres[50645]: [15-1] 2015-05-21 21:23:40.222
> CEST @ 50645 LOG: consistent recovery state reached at
> 4C4D/90FFF9C8
> May 21 21:23:40 d8 postgres[50644]: [3-1] 2015-05-21 21:23:40.222 CEST
> @ 50644 LOG: database system is ready to accept read only
> connections
> May 21 21:23:40 d8 postgres[50645]: [16-1] 2015-05-21 21:23:40.223
> CEST @ 50645 LOG: unexpected pageaddr 4C46/E000000 in log segment
> 0000000400004C4D00000091, offset 0
> May 21 21:23:40 d8 postgres[50699]: [4-1] 2015-05-21 21:23:40.232 CEST
> @ 50699 LOG: started streaming WAL from primary at 4C4D/90000000 on
> timeline 4
> May 21 21:23:40 d8 postgres[50699]: [5-1] 2015-05-21 21:23:40.232 CEST
> @ 50699 FATAL: could not receive data from WAL stream: ERROR:
> requested WAL segment 0000000400004C4D00000090 has already been
> removed
> May 21 21:23:40 d8 postgres[50645]: [17-1] 2015-05-21 21:23:40.255
> CEST @ 50645 LOG: restored log file "0000000400004C4D00000090" from
> archive
> May 21 21:23:40 d8 postgres[50703]: [4-1] 2015-05-21 21:23:40.268 CEST
> @ 50703 LOG: started streaming WAL from primary at 4C4D/90000000 on
> timeline 4
> May 21 21:23:40 d8 postgres[50703]: [5-1] 2015-05-21 21:23:40.268 CEST
> @ 50703 FATAL: could not receive data from WAL stream: ERROR:
> requested WAL segment 0000000400004C4D00000090 has already been
> removed
> May 21 21:23:40 d8 postgres[50703]: [5-2]
> ...
> (and so on)
>
> So, as I understand:
> - slave was started and entered restore,
> - slave restored 0000000400004C4D00000090 from archive, reached
> consistent recovery state
> - now, it connected to master and noticed, that, it has
> 0000000400004C4D00000091 segment uncomplete
> - and then, started yelling about missing segment (master deleted it
> already after archiving) on master
>
> Why?

It might be yelling about the WAL segment due to the delay in shipping it
from master to slave.
Do you have the restore_command set up in the recovery.conf file ? do you
have any automated job which is shipping WAL archives from master to slave.

Slave - if it does not find the WAL segment in master pg_xlog location, it
looks for the same in the location mentioned in the restore_command. It
yells if it is unable to find it.

Cannot give any concrete comments/suggestions, until recovery.conf entries
are known.

> I fixed it by stopping slave, doing some operations on master, copying
> 0000000400004C4D00000091 and 0000000400004C4D00000092 from master, and
> staring the slave:
>
>
> May 21 21:24:56 d8 postgres[50644]: [4-1] 2015-05-21 21:24:56.160 CEST
> @ 50644 LOG: received fast shutdown request
> May 21 21:24:56 d8 postgres[50644]: [5-1] 2015-05-21 21:24:56.160 CEST
> @ 50644 LOG: aborting any active transactions
> May 21 21:24:56 d8 postgres[50657]: [3-1] 2015-05-21 21:24:56.162 CEST
> @ 50657 LOG: shutting down
> May 21 21:24:56 d8 postgres[50657]: [4-1] 2015-05-21 21:24:56.174 CEST
> @ 50657 LOG: database system is shut down
> May 21 21:28:54 d8 postgres[53574]: [1-1] 2015-05-21 21:28:54.781 CEST
> @ 53574 LOG: loaded library "pg_stat_statements"
> May 21 21:28:55 d8 postgres[53574]: [2-1] 2015-05-21 21:28:55.123 CEST
> @ 53574 LOG: ending log output to stderr
> May 21 21:28:55 d8 postgres[53574]: [2-2] 2015-05-21 21:28:55.123 CEST
> @ 53574 HINT: Future log output will go to log destination
> "syslog".
> May 21 21:28:55 d8 postgres[53575]: [3-1] 2015-05-21 21:28:55.124 CEST
> @ 53575 LOG: database system was shut down in recovery at
> 2015-05-21 21:24:56 CEST
> May 21 21:28:55 d8 postgres[53575]: [4-1] 2015-05-21 21:28:55.126 CEST
> @ 53575 LOG: entering standby mode
> May 21 21:28:55 d8 postgres[53575]: [5-1] 2015-05-21 21:28:55.150 CEST
> @ 53575 LOG: restored log file "0000000400004C4D00000088" from
> archive
> May 21 21:28:55 d8 postgres[53575]: [6-1] 2015-05-21 21:28:55.209 CEST
> @ 53575 LOG: redo starts at 4C4D/88493B50
> May 21 21:28:55 d8 postgres[53575]: [7-1] 2015-05-21 21:28:55.314 CEST
> @ 53575 LOG: restored log file "0000000400004C4D00000089" from
> archive
> May 21 21:28:55 d8 postgres[53575]: [8-1] 2015-05-21 21:28:55.512 CEST
> @ 53575 LOG: restored log file "0000000400004C4D0000008A" from
> archive
> May 21 21:28:55 d8 postgres[53575]: [9-1] 2015-05-21 21:28:55.831 CEST
> @ 53575 LOG: restored log file "0000000400004C4D0000008B" from
> archive
> May 21 21:28:56 d8 postgres[53575]: [10-1] 2015-05-21 21:28:56.508
> CEST @ 53575 LOG: restored log file "0000000400004C4D0000008C" from
> archive
> May 21 21:28:56 d8 postgres[53575]: [11-1] 2015-05-21 21:28:56.818
> CEST @ 53575 LOG: restored log file "0000000400004C4D0000008D" from
> archive
> May 21 21:28:56 d8 postgres[53575]: [12-1] 2015-05-21 21:28:56.982
> CEST @ 53575 LOG: restored log file "0000000400004C4D0000008E" from
> archive
> May 21 21:28:57 d8 postgres[53575]: [13-1] 2015-05-21 21:28:57.415
> CEST @ 53575 LOG: restored log file "0000000400004C4D0000008F" from
> archive
> May 21 21:28:57 d8 postgres[53575]: [14-1] 2015-05-21 21:28:57.790
> CEST @ 53575 LOG: restored log file "0000000400004C4D00000090" from
> archive
> May 21 21:28:58 d8 postgres[53575]: [15-1] 2015-05-21 21:28:58.328
> CEST @ 53575 LOG: consistent recovery state reached at
> 4C4D/90FFF9C8
> May 21 21:28:58 d8 postgres[53574]: [3-1] 2015-05-21 21:28:58.328 CEST
> @ 53574 LOG: database system is ready to accept read only
> connections
> May 21 21:28:58 d8 postgres[53575]: [16-1] 2015-05-21 21:28:58.349
> CEST @ 53575 LOG: restored log file "0000000400004C4D00000091" from
> archive
> May 21 21:28:59 d8 postgres[53575]: [17-1] 2015-05-21 21:28:59.962
> CEST @ 53575 LOG: restored log file "0000000400004C4D00000092" from
> archive
> May 21 21:29:00 d8 postgres[53575]: [18-1] 2015-05-21 21:29:00.037
> CEST @ 53575 LOG: unexpected pageaddr 4C45/DC000000 in log segment
> 0000000400004C4D00000093, offset 0
> May 21 21:29:00 d8 postgres[53640]: [4-1] 2015-05-21 21:29:00.047 CEST
> @ 53640 LOG: started streaming WAL from primary at 4C4D/92000000 on
> timeline 4
>
> What was wrong?
>

This error message can be ignored, did you try shipping
0000000400004C4D00000093 to slave from master's archive location and see if
slave picked it up and applied ?

Regards,
Venkata B N

Fujitsu Australia

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-05-22 02:44:46 Re: date with month and year
Previous Message Ian Barwick 2015-05-22 00:06:55 Re: Enum in foreign table: error and correct way to handle.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-05-22 00:45:27 Re: Postgres and TLSv1.2
Previous Message Tom Lane 2015-05-21 23:48:42 Re: GiST KNN Crasher