RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

From: B Ganesh Kishan <bkishan(at)commvault(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "peter(dot)eisentraut(at)2ndquadrant(dot)com" <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "leif(at)lako(dot)no" <leif(at)lako(dot)no>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Meera Nair <mnair(at)commvault(dot)com>, Dev-PostgreSQL <Dev-PostgreSQL(at)commvault(dot)com>
Subject: RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER
Date: 2022-01-27 13:59:46
Message-ID: SA1PR19MB5088936CE8AE66E9AD09429FB7219@SA1PR19MB5088.namprd19.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Adding Peter/Leif , who worked on this change.

Hi Peter/Lief,

Could you please help here?

Server start ended with FATAL error and specified the last completed transaction time.

2022-01-27 15:35:33.962 IST [29209] LOG: starting point-in-time recovery to 2022-01-27 12:56:34+05:30
2022-01-27 15:35:33.966 IST [29209] LOG: restored log file "00000009.history" from archive
2022-01-27 15:35:33.981 IST [29209] LOG: restored log file "000000090000000000000061" from archive
2022-01-27 15:35:42.121 IST [29209] LOG: redo starts at 0/61000060
2022-01-27 15:35:42.122 IST [29209] LOG: consistent recovery state reached at 0/61000170
2022-01-27 15:35:42.123 IST [29207] LOG: database system is ready to accept read only connections
2022-01-27 15:35:42.139 IST [29209] LOG: restored log file "000000090000000000000062" from archive
2022-01-27 15:35:42.456 IST [29209] LOG: restored log file "000000090000000000000063" from archive
2022-01-27 15:35:44.923 IST [29209] LOG: restored log file "000000090000000000000064" from archive
2022-01-27 15:35:45.897 IST [29209] LOG: restored log file "000000090000000000000065" from archive
cp: cannot stat '/var/lib/pgsql/13/wal/000000090000000000000066': No such file or directory
2022-01-27 15:35:46.280 IST [29209] LOG: redo done at 0/65000148
2022-01-27 15:35:46.280 IST [29209] LOG: last completed transaction was at log time 2022-01-25 15:05:34.132546+05:30
2022-01-27 15:35:46.280 IST [29209] FATAL: recovery ended before configured recovery target was reached
2022-01-27 15:35:46.281 IST [29207] LOG: startup process (PID 29209) exited with exit code 1
2022-01-27 15:35:46.281 IST [29207] LOG: terminating any other active server processes
2022-01-27 15:35:46.326 IST [29207] LOG: database system is shut down

Provided the recovery target time as the last completed transaction time specified in the above log. But the server start is still failing with same error.

2022-01-27 19:23:30.148 IST [7598] LOG: starting PostgreSQL 13.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2022-01-27 19:23:30.148 IST [7598] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-01-27 19:23:30.148 IST [7598] LOG: listening on IPv6 address "::", port 5432
2022-01-27 19:23:30.149 IST [7598] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-01-27 19:23:30.150 IST [7598] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-27 19:23:30.153 IST [7600] LOG: database system was interrupted while in recovery at log time 2022-01-25 14:59:53 IST
2022-01-27 19:23:30.153 IST [7600] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
cp: cannot stat '/var/lib/pgsql/13/wal/0000000A.history': No such file or directory
2022-01-27 19:23:30.206 IST [7600] LOG: starting point-in-time recovery to 2022-01-25 15:05:34.132546+05:30
2022-01-27 19:23:30.209 IST [7600] LOG: restored log file "00000009.history" from archive
2022-01-27 19:23:30.247 IST [7600] LOG: restored log file "000000090000000000000061" from archive
2022-01-27 19:23:30.502 IST [7600] LOG: redo starts at 0/61000060
2022-01-27 19:23:30.528 IST [7600] LOG: restored log file "000000090000000000000062" from archive
2022-01-27 19:23:31.222 IST [7600] LOG: restored log file "000000090000000000000063" from archive
2022-01-27 19:23:31.942 IST [7600] LOG: restored log file "000000090000000000000064" from archive
2022-01-27 19:23:33.007 IST [7600] LOG: consistent recovery state reached at 0/640016A0
2022-01-27 19:23:33.008 IST [7598] LOG: database system is ready to accept read only connections
2022-01-27 19:23:33.025 IST [7600] LOG: restored log file "000000090000000000000065" from archive
cp: cannot stat '/var/lib/pgsql/13/wal/000000090000000000000066': No such file or directory
2022-01-27 19:23:33.409 IST [7600] LOG: redo done at 0/65000148
2022-01-27 19:23:33.409 IST [7600] LOG: last completed transaction was at log time 2022-01-25 15:05:34.132546+05:30
2022-01-27 19:23:33.409 IST [7600] FATAL: recovery ended before configured recovery target was reached
2022-01-27 19:23:33.411 IST [7598] LOG: startup process (PID 7600) exited with exit code 1
2022-01-27 19:23:33.411 IST [7598] LOG: terminating any other active server processes
2022-01-27 19:23:33.415 IST [7598] LOG: database system is shut down

Thanks and Regards,
Ganesh Kishan

-----Original Message-----
From: B Ganesh Kishan
Sent: 24 January 2022 11:25
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org; Meera Nair <mnair(at)commvault(dot)com>
Subject: RE: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

Hi Tom/David,

I understand the situation now.

The problem here for us is as below:
1) We run a Full backup using pg_start_backup()--> backup data directory --> pg_stop_backup() --> backup all archive log files.
Call this as FULL_BACKUP1

2) After full, log only backups are run periodically --> Using pg_switch_wal() and backup only archive log files.
During the log backups, there is a chance that no transaction is run in-between, but we there is no way to know about that at this point and we backup the switched logs.
Example: In below 3 log only backups, say we do not have any transactions run in between.
a. LOG_BACKUP_1
b. LOG_BACKUP_2
c. LOG_BACKUP_3

3) Now in our case if the user wants to restore to the time when LOG_BACKUP_2 is run, we specify the recovery_target_time as time when LOG_BACKUP_2 was run and try to start the server. But it fails because in PG13 we cannot specify time later than the last available transaction and we need to specify the time when the last transaction is available(Is there any way to get this time?).

To handle this, we would need any of these resolution/Workarounds:
1) During restore, is there any way we can get the last available transaction near to the specified retore time?
2) Is there any way we can make sure archive logs are generated only when there is a transaction available?
3) It would be very helpful if there are any settings to override old feature(Stop and bring up the server with latest available logs) over the new one.

Thanks and Regards,
Ganesh Kishan

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: 21 January 2022 20:51
To: David G. Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: B Ganesh Kishan <bkishan(at)commvault(dot)com>; pgsql-bugs(at)lists(dot)postgresql(dot)org; Meera Nair <mnair(at)commvault(dot)com>
Subject: Re: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER

External email. Inspect before opening.

"David G. Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> writes:
> On Fri, Jan 21, 2022 at 4:20 AM B Ganesh Kishan
> <bkishan(at)commvault(dot)com>
> wrote:
>> The problem is that we are providing a time target that Postgres does
>> not know how to reach. This is because there are no transactions in
>> between the backups.

> I don't quite follow the overall situation but given your observation
> and apparent acceptance of the pre-v13 behavior just don't specify a
> restore point and let WAL replay everything.

Yeah. If I'm understanding the situation, when you specify a target time that is later than the last transaction available from WAL, older versions silently assumed that stopping with the last available transaction is OK.
Newer ones complain because it's not clear whether that's OK --- in particular, there's no good way to be sure that no WAL is missing.

On the whole I think that's a good change. I can sympathize with the complaint that it creates additional complexity for restore scripts, but I'm a little dubious that this is something you'd be wanting to script anyway.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Bharath Rupireddy 2022-01-27 14:16:55 Re: BUG #17375: RECOVERY TARGET TIME RESTORE IS FAILING TO START SERVER
Previous Message Christophe Courtois 2022-01-27 10:11:50 No access to TOAST tables shown in EXPLAIN ( ANALYZE, BUFFERS )