BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: leif(at)lako(dot)no
Subject: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Date: 2019-01-11 08:36:35
Message-ID: 15589-8e6e353a97b085fb@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 15589
Logged by: Leif Gunnar Erlandsen
Email address: leif(at)lako(dot)no
PostgreSQL version: 11.1
Operating system: Red Hat Enterprise Linux Server release 7.6
Description:

recovery.conf was specified as
restore_command = 'cp /db/disk1/restore/archivelogs/archivelogs/%f %p'
recovery_target_time = '2019-01-03 13:00:00'

Due to some missing wal-files restore ended.

2019-01-10 12:05:50 CET [68417]: [67-1] user=,db=,app=,client= LOG:
restored log file "0000000500000158000000FF" from archive
cp: cannot stat
'/db/disk1/restore/archivelogs/archivelogs/000000050000015900000000': No
such file or directory
2019-01-10 12:05:50 CET [68417]: [68-1] user=,db=,app=,client= LOG: redo
done at 158/FFFFFFB8
2019-01-10 12:05:50 CET [68417]: [69-1] user=,db=,app=,client= LOG: last
completed transaction was at log time 2019-01-03 06:34:45.935752+01
2019-01-10 12:05:50 CET [68417]: [70-1] user=,db=,app=,client= LOG:
restored log file "0000000500000158000000FF" from archive
cp: cannot stat
'/db/disk1/restore/archivelogs/archivelogs/00000006.history': No such file
or directory
2019-01-10 12:05:50 CET [68417]: [71-1] user=,db=,app=,client= LOG:
selected new timeline ID: 6
2019-01-10 12:05:50 CET [68417]: [72-1] user=,db=,app=,client= LOG: archive
recovery complete
cp: cannot stat
'/db/disk1/restore/archivelogs/archivelogs/00000005.history': No such file
or directory
2019-01-10 12:05:51 CET [68420]: [2-1] user=,db=,app=,client= LOG:
restartpoint complete: wrote 44395 buffers (5.4%); 1 WAL file(s) added, 0
removed, 0 recycled; write=6.310 s, sync=0.268 s, total=6.631 s; sync
files=178, longest=0.072 s, average=0.001 s; distance=64019 kB,
estimate=64019 kB
2019-01-10 12:05:51 CET [68420]: [3-1] user=,db=,app=,client= LOG: recovery
restart point at 158/C4E84F98
2019-01-10 12:05:51 CET [68420]: [4-1] user=,db=,app=,client= DETAIL: Last
completed transaction was at log time 2019-01-03 06:34:45.935752+01.
2019-01-10 12:05:51 CET [68420]: [5-1] user=,db=,app=,client= LOG:
checkpoints are occurring too frequently (7 seconds apart)
2019-01-10 12:05:51 CET [68420]: [6-1] user=,db=,app=,client= HINT:
Consider increasing the configuration parameter "max_wal_size".
2019-01-10 12:05:51 CET [68420]: [7-1] user=,db=,app=,client= LOG:
checkpoint starting: end-of-recovery immediate wait xlog
2019-01-10 12:05:51 CET [68420]: [8-1] user=,db=,app=,client= LOG:
checkpoint complete: wrote 18678 buffers (2.3%); 0 WAL file(s) added, 0
removed, 0 recycled; write=0.251 s, sync=0.006 s, total=0.312 s; sync
files=149, longest=0.006 s, average=0.000 s; distance=968172 kB,
estimate=968172 kB
2019-01-10 12:05:51 CET [68415]: [8-1] user=,db=,app=,client= LOG: database
system is ready to accept connections

I found the missing wal-files and performed restore again from the start.
Then recovery paused when it was at correct time.

2019-01-10 13:46:28 CET [77004]: [3334-1] user=,db=,app=,client= LOG:
restored log file "0000000500000165000000C2" from archive
2019-01-10 13:46:28 CET [77007]: [318-1] user=,db=,app=,client= LOG:
restartpoint complete: wrote 87591 buffers (10.7%); 0 WAL file(s) added, 22
removed, 20 recycled; write=3.049 s, sync=0.001 s, total=3.192 s; sync
files=143, longest=0.001 s, average=0.000 s; distance=688531 kB,
estimate=689818 kB
2019-01-10 13:46:28 CET [77007]: [319-1] user=,db=,app=,client= LOG:
recovery restart point at 165/9706C358
2019-01-10 13:46:28 CET [77007]: [320-1] user=,db=,app=,client= DETAIL:
Last completed transaction was at log time 2019-01-03 12:13:22.014815+01.
2019-01-10 13:46:28 CET [77007]: [321-1] user=,db=,app=,client= LOG:
restartpoint starting: xlog
2019-01-10 13:46:29 CET [77004]: [3335-1] user=,db=,app=,client= LOG:
restored log file "0000000500000165000000C3" from archive
2019-01-10 13:46:29 CET [77004]: [3336-1] user=,db=,app=,client= LOG:
recovery stopping before commit of transaction 3316604, time 2019-01-03
13:00:01.563953+01
2019-01-10 13:46:29 CET [77004]: [3337-1] user=,db=,app=,client= LOG:
recovery has paused
2019-01-10 13:46:29 CET [77004]: [3338-1] user=,db=,app=,client= HINT:
Execute pg_wal_replay_resume() to continue.

PostgreSQL should have paused recovery also on the first scenario. Then I
could have added missing wal and continued with restore.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2019-01-11 14:03:04 Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Previous Message Thomas Munro 2019-01-11 05:16:27 Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-01-11 09:06:18 Re: Displaying and dumping of table access methods
Previous Message Pavel Stehule 2019-01-11 08:34:58 Re: [Proposal] Add accumulated statistics