Re: requested timeline ... does not contain minimum recovery point ...

From: Christophe Pettus <xof(at)thebuild(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: requested timeline ... does not contain minimum recovery point ...
Date: 2018-07-12 21:26:17
Message-ID: ECE3B665-E9DD-43FF-B6A6-734E7435221D@thebuild.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Jul 12, 2018, at 10:29, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> This needs a lot more information before somebody can reasonably act on
> it.

Happy to provide, of course!

The two relevant hosts are "Ash" and "Chi". As mentioned, they've been flipped back and forth repeatedly using pg_rewind: One will be promoted, the other pg_rewind'd, and then brought up as a WAL shipping secondary to the first. This procedure has worked repeatedly, but this last instance failed. The logs from the failure are attached below, along with the output of from pg_controldata for both hosts. The systems are still in this configuration, so we can gather more as required.

What surprises me about the error is that while the recovery point seems reasonable, it shouldn't be on timeline 103, but on timeline 105.

Failover to Ash:
2018-07-10 19:14:22 UTC [2072]: [5153-1] user=,db=,app=,client= LOG: received promote request
2018-07-10 19:14:22 UTC [2072]: [5154-1] user=,db=,app=,client= LOG: redo done at A58/4F0822D0
2018-07-10 19:14:22 UTC [2072]: [5155-1] user=,db=,app=,client= LOG: last completed transaction was at log time 2018-07-10 19:13:54.515121+00
2018-07-10 19:14:23 UTC [2072]: [5156-1] user=,db=,app=,client= LOG: restored log file "0000006700000A580000004F" from archive
2018-07-10 19:14:23 UTC [2072]: [5157-1] user=,db=,app=,client= LOG: selected new timeline ID: 104
2018-07-10 19:14:24 UTC [2072]: [5158-1] user=,db=,app=,client= LOG: restored log file "00000067.history" from archive

[Chi rewound using pg_rewind against Ash, brought up]

Chi:
2018-07-10 19:26:05 UTC [3260]: [4-1] user=,db=,app=,client= LOG: restored log file "00000068.history" from archive
2018-07-10 19:26:06 UTC [3260]: [5-1] user=,db=,app=,client= LOG: redo starts at A58/4E061EF8
2018-07-10 19:26:07 UTC [3260]: [6-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000050" from archive
2018-07-10 19:26:08 UTC [3260]: [7-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000051" from archive
2018-07-10 19:26:09 UTC [3260]: [8-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000052" from archive
2018-07-10 19:26:10 UTC [3260]: [9-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000053" from archive
2018-07-10 19:26:11 UTC [3260]: [10-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000054" from archive
2018-07-10 19:26:12 UTC [3260]: [11-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000055" from archive
2018-07-10 19:26:13 UTC [3260]: [12-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000056" from archive
2018-07-10 19:26:13 UTC [3260]: [13-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000057" from archive
2018-07-10 19:26:14 UTC [3260]: [14-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000058" from archive
2018-07-10 19:26:15 UTC [3260]: [15-1] user=,db=,app=,client= LOG: restored log file "0000006800000A5800000059" from archive
2018-07-10 19:26:15 UTC [3260]: [16-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005A" from archive
2018-07-10 19:26:16 UTC [3260]: [17-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005B" from archive
2018-07-10 19:26:17 UTC [3260]: [18-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005C" from archive
2018-07-10 19:27:28 UTC [3260]: [19-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005D" from archive
2018-07-10 19:27:28 UTC [3260]: [20-1] user=,db=,app=,client= LOG: consistent recovery state reached at A58/5D01BCA8
2018-07-10 19:27:28 UTC [2564]: [3-1] user=,db=,app=,client= LOG: database system is ready to accept read only connections
2018-07-10 19:28:28 UTC [3260]: [21-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005E" from archive

Chi promoted:
2018-07-10 19:28:37 UTC [3260]: [22-1] user=,db=,app=,client= LOG: received promote request
2018-07-10 19:28:37 UTC [3260]: [23-1] user=,db=,app=,client= LOG: redo done at A58/5E0817D0
2018-07-10 19:28:37 UTC [3260]: [24-1] user=,db=,app=,client= LOG: last completed transaction was at log time 2018-07-10 19:28:12.954559+00
2018-07-10 19:28:37 UTC [3260]: [25-1] user=,db=,app=,client= LOG: restored log file "0000006800000A580000005E" from archive
2018-07-10 19:28:37 UTC [3260]: [26-1] user=,db=,app=,client= LOG: selected new timeline ID: 105
2018-07-10 19:28:38 UTC [3260]: [27-1] user=,db=,app=,client= LOG: restored log file "00000068.history" from archive
2018-07-10 19:28:38 UTC [3260]: [28-1] user=,db=,app=,client= LOG: archive recovery complete
2018-07-10 19:28:38 UTC [3260]: [29-1] user=,db=,app=,client= LOG: MultiXact member wraparound protections are now enabled
2018-07-10 19:28:38 UTC [5068]: [1-1] user=,db=,app=,client= LOG: checkpoint starting: force
2018-07-10 19:28:38 UTC [2564]: [4-1] user=,db=,app=,client= LOG: database system is ready to accept connections

[Ash rewound using pg_rewind against Chi, brought up]

Ash:
2018-07-10 19:40:09 UTC [520]: [3-1] user=,db=,app=,client= LOG: restored log file "00000069.history" from archive
2018-07-10 19:40:10 UTC [520]: [4-1] user=,db=,app=,client= LOG: entering standby mode
2018-07-10 19:40:10 UTC [520]: [5-1] user=,db=,app=,client= LOG: restored log file "00000069.history" from archive
2018-07-10 19:40:12 UTC [520]: [6-1] user=,db=,app=,client= LOG: restored log file "0000006700000A580000004E" from archive
2018-07-10 19:40:12 UTC [520]: [7-1] user=,db=,app=,client= FATAL: requested timeline 105 does not contain minimum recovery point A58/6B109F28 on timeline 103

--
From Ash:

C:\Windows\system32>pg_controldata -D e:\pgdata
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6377726782605679660
Database cluster state: in archive recovery
pg_control last modified: 7/10/2018 7:28:38 PM
Latest checkpoint location: A58/4E0689F0
Prior checkpoint location: A58/51E0EF50
Latest checkpoint's REDO location: A58/4E061EF8
Latest checkpoint's REDO WAL file: 0000006700000A580000004E
Latest checkpoint's TimeLineID: 103
Latest checkpoint's PrevTimeLineID: 103
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:2428875781
Latest checkpoint's NextOID: 2368444
Latest checkpoint's NextMultiXactId: 2
Latest checkpoint's NextMultiOffset: 3
Latest checkpoint's oldestXID: 2243636521
Latest checkpoint's oldestXID's DB: 16397
Latest checkpoint's oldestActiveXID: 2428782169
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16396
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: 7/10/2018 7:12:11 PM
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: A58/6B109F28
Min recovery ending loc's timeline: 103
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

From Chi:

C:\Windows\system32>pg_controldata -D e:\pgdata
pg_control version number: 960
Catalog version number: 201608131
Database system identifier: 6377726782605679660
Database cluster state: in production
pg_control last modified: 7/11/2018 3:39:21 PM
Latest checkpoint location: A5A/C205EE68
Prior checkpoint location: A5A/BF15ED30
Latest checkpoint's REDO location: A5A/C1102590
Latest checkpoint's REDO WAL file: 0000006900000A5A000000C1
Latest checkpoint's TimeLineID: 105
Latest checkpoint's PrevTimeLineID: 105
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:2431541507
Latest checkpoint's NextOID: 2384828
Latest checkpoint's NextMultiXactId: 2
Latest checkpoint's NextMultiOffset: 3
Latest checkpoint's oldestXID: 2243636521
Latest checkpoint's oldestXID's DB: 16397
Latest checkpoint's oldestActiveXID: 2431541507
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 16396
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: 7/11/2018 3:38:41 PM
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: on
max_connections setting: 1000
max_worker_processes setting: 8
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

--
-- Christophe Pettus
xof(at)thebuild(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-12 21:45:35 Re: Temporary WAL segments files not cleaned up after an instance crash
Previous Message Tom Lane 2018-07-12 20:05:38 Re: When use prepared protocol, transaction will hold backend_xmin until the end of the transaction.