Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write

From: leif(at)lako(dot)no
To: "Michael Paquier" <michael(at)paquier(dot)xyz>, "Kyotaro HORIGUCHI" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15589: Due to missing wal, restore ends prematurely and opens database for read/write
Date: 2019-02-27 09:14:40
Message-ID: df5bf3d6d83030c824da01d1f20f4363@lako.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

"Michael Paquier" <michael(at)paquier(dot)xyz> skrev 26. februar 2019 kl. 09:13:

> On Thu, Jan 31, 2019 at 09:26:48PM +0900, Kyotaro HORIGUCHI wrote:
>
>> I don't think no one expected that server follows
>> recovery_target_action without setting a target, so we can change
>> the behavior when any kind of target is specified. So I propose
>> to follow recovery_target_action even if not rached the target
>> when any recovery target isspecified.
>
> Quoting the docs:
> https://www.postgresql.org/docs/current/recovery-target-settings.html
> recovery_target_action (enum)
> "Specifies what action the server should take once the recovery target
> is *reached*."

I know this and recovery_target_action in my case was "pause".
Recovery target was specified with a date and time.

> So what we have now is that an action would be taken iff a stop point
> is defined and reached. What this patch changes is that the action
> would be taken even if the stop point has *not* been reached once the
> end of a WAL stream is found.

Yes, and this is expected behaviour in my use case. This was a PITR scenario, to a new server, and not crash recovery.
I restored a backup and placed WAL-files in a separate directory, then I created a recovery.conf with correct recovery_target_time.
After PostgreSQL started it stopped after a short while and opened the database in read/write.
Checks showed target was not reached. Log showed that no more WAL could be found.
If PostgreSQL had followed recovery_target_action, then I could have restored the missing WAL-files and continued replay of WAL.
As this was not the case I had to restart the process from the beginning, this took many hours.
Another thing to consider is that in instances such as this one, where a lot of WAL was needed for replay, it is not always given that we have the sufficient amount of available disk space in order to store them all at the same time.


> Please do not take me wrong, I can see that there could be use cases
> where it is possible to take an action at the end of a WAL stream if
> there is less WAL than what was planned, perhaps if the OP has set
> an incorrect stop position too far in the future, still too much WAL
> would have been replayed so it would make the base backup unusable for
> future uses. Also, it looks incorrect to me to change an existing
> behavior and to use the same semantics for triggering an action if a
> stop point is defined and reached.

I did not set an incorrect stop position. I see this change as something most in a similar situation would expect from their database system.

AFAIK the doc does not specify what happens if recovery_target_time is specified but not reached. But as default recovery_target_action is set to "pause" I would have assumed "pause" to be the action.

regards
Leif Gunnar Erlandsen

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Dean Rasheed 2019-02-27 09:37:11 Re: BUG #15623: Inconsistent use of default for updatable view
Previous Message PG Bug reporting form 2019-02-27 07:24:39 BUG #15659: missing comment "change requires restart" in postgresql.conf for parameter "data_sync_retry"

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-02-27 09:23:45 Re: get_controlfile() can leak fds in the backend
Previous Message Ideriha, Takeshi 2019-02-27 08:16:36 RE: Protect syscache from bloating with negative cache entries