Re: [BUG] non archived WAL removed during production crash recovery

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, michael(at)paquier(dot)xyz
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-02 14:55:46
Message-ID: 3d543ef2-ae02-5e59-3bfe-b79d00dab360@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2020/04/02 22:49, Jehan-Guillaume de Rorthais wrote:
> On Thu, 2 Apr 2020 19:38:59 +0900
> Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>> On 2020/04/02 16:23, Kyotaro Horiguchi wrote:
>>> At Thu, 2 Apr 2020 14:19:15 +0900, Fujii Masao
>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
> [...]
>>>> is whether to remove such WAL files in archive recovery case with
>>>> archive_mode=on. Those WAL files would be required when recovering
>>>> from the backup taken before that archive recovery happens.
>>>> So it seems unsafe to remove them in that case.
>>>
>>> I'm not sure I'm getting the intention correctly, but I think it
>>> responsibility of the operator to provide a complete set of archived
>>> WAL files for a backup. Could you elaborate what operation steps are
>>> you assuming of?
>>
>> Please imagine the case where you need to do archive recovery
>> from the database snapshot taken while there are many WAL files
>> with .ready files. Those WAL files have not been archived yet.
>> In this case, ISTM those WAL files should not be removed until
>> they are archived, when archive_mode = on.
>
> If you rely on snapshot without pg_start/stop_backup, I agree. Theses WAL
> should be archived if:
>
> * archive_mode >= on for primary
> * archive_mode = always for standby
>
>>>> Therefore, IMO that the patch should change the code so that
>>>> no unarchived WAL files are removed not only in crash recovery
>>>> but also archive recovery. Thought?
>>>
>>> Agreed if "an unarchived WAL" means "a WAL file that is marked .ready"
>>> and it should be archived immediately. My previous mail is written
>>> based on the same thought.
>>
>> Ok, so our *current* consensus seems the followings. Right?
>>
>> - If archive_mode=off, any WAL files with .ready files are removed in
>> crash recovery, archive recoery and standby mode.
>
> yes
>
>> - If archive_mode=on, WAL files with .ready files are removed only in
>> standby mode. In crash recovery and archive recovery cases, they keep
>> remaining and would be archived after recovery finishes (i.e., during
>> normal processing).
>
> yes
>
>> - If archive_mode=always, in crash recovery, archive recovery and
>> standby mode, WAL files with .ready files are archived if WAL archiver
>> is running.
>
> yes
>
>> That is, WAL files with .ready files are removed when either
>> archive_mode!=always in standby mode or archive_mode=off.
>
> sounds fine to me.
>
> [...]
>>>>>>>> Another is to make the startup process remove .ready file if
>>>>>>>> necessary.
>>>>>>>
>>>>>>> I'm not sure to understand this one.
>>>>
>>>> I was thinking to make the startup process remove such unarchived WAL
>>>> files
>>>> if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested
>>>> is true.
>
> Ok, understood.
>
>>> As mentioned above, I don't understand the point of preserving WAL
>>> files that are either marked as .ready or not marked at all on a
>>> standby with archive_mode=on.
>>
>> Maybe yes. But I'm not confident about that there is no such case.
>
> Well, it seems to me that this is what you suggested few paragraph away:
>
> «.ready files are removed when either archive_mode!=always in standby mode»

Yes, so I'm fine with that as the first consensus because the behavior
is obviously better than the current one. *If* the case where no WAL files
should be removed is found, I'd just like to propose the additional patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2020-04-02 14:58:00 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Jehan-Guillaume de Rorthais 2020-04-02 13:49:15 Re: [BUG] non archived WAL removed during production crash recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-04-02 14:58:00 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Tom Lane 2020-04-02 14:44:53 Re: Berserk Autovacuum (let's save next Mandrill)