Re: [BUG] non archived WAL removed during production crash recovery

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, jgdr(at)dalibo(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, michael(at)paquier(dot)xyz
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-02 05:19:15
Message-ID: 03900fc5-fff4-109a-fe69-83de2c167929@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2020/04/02 13:07, Kyotaro Horiguchi wrote:
> Sorry, it was quite ambiguous.
>
> At Thu, 02 Apr 2020 13:04:43 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
>> At Wed, 1 Apr 2020 18:17:35 +0200, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote in
>>> Please, find in attachment a patch implementing this.
>>
>> The patch partially reintroduces the issue the patch have
>> fixed. Specifically a standby running a crash recovery wrongly marks a
>> WAL file as ".ready" if it is extant in pg_wal without accompanied by
>> .ready file.
>
> The patch partially reintroduces the issue the commit 78ea8b5daa have
> fixed. Specifically a standby running a crash recovery wrongly marks a
> WAL file as ".ready" if it is extant in pg_wal without accompanied by
> .ready file.

On second thought, I think that we should discuss what the desirable
behavior is before the implentation. Especially what's unclear to me
is whether to remove such WAL files in archive recovery case with
archive_mode=on. Those WAL files would be required when recovering
from the backup taken before that archive recovery happens.
So it seems unsafe to remove them in that case.

Therefore, IMO that the patch should change the code so that
no unarchived WAL files are removed not only in crash recovery
but also archive recovery. Thought?

Of course, this change would lead to the issue that the past unarchived
WAL files keep remaining in the case of warm-standby using archive
recovery. But this issue looks unavoidable. If users want to avoid that,
archive_mode should be set to always.

Also I'm a bit wondering if it's really safe to remove such unarchived
WAL files even in the standby case with archive_mode=on. I would need
more time to think that.

>> Perhaps checking '.ready' before the checking for archive-mode would
>> be sufficient.
>>
>>> Plus, I added a second commit to add one test in regard with this bug.
>>>
>>>> Another is to make the startup process remove .ready file if necessary.
>>>
>>> I'm not sure to understand this one.

I was thinking to make the startup process remove such unarchived WAL files
if archive_mode=on and StandbyModeRequested/ArchiveRecoveryRequested
is true.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-02 06:01:54 Re: BUG #16109: Postgres planning time is high across version (Expose buffer usage during planning in EXPLAIN)
Previous Message Kyotaro Horiguchi 2020-04-02 04:07:34 Re: [BUG] non archived WAL removed during production crash recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-04-02 05:25:18 Re: Some problems of recovery conflict wait events
Previous Message Noah Misch 2020-04-02 05:03:31 Re: Autovacuum vs vac_update_datfrozenxid() vs ?