Re: [BUG] non archived WAL removed during production crash recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: [BUG] non archived WAL removed during production crash recovery
Date: 2020-04-21 02:15:01
Message-ID: 20200421021501.GE77439@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Apr 20, 2020 at 02:22:35PM +0200, Jehan-Guillaume de Rorthais wrote:
> The problem is that we would have to read the controldata file each time we
> wonder if a segment should be archived/removed. Moreover, the controldata
> file might not be in sync quickly enough with the real state for some other
> code path or futur needs.

I don't think that this is what Horiguchi-san meant here. What I got
from his previous message would be to be to copy the shared value from
the control file when necessary, and have the shared state use only a
subset of the existing values of DBState, aka:
- DB_IN_CRASH_RECOVERY
- DB_IN_ARCHIVE_RECOVERY
- DB_IN_PRODUCTION
Still, that sounds wrong to me because then somebody would be tempted
to change the shared value thinking that things like DB_SHUTDOWNING,
DB_SHUTDOWNED_* or DB_STARTUP are valid but we don't want that here.
Note that there may be a case for DB_STARTUP to be used in
XLOGShmemInit(), but I'd rather let the code use the safest default,
DB_IN_CRASH_RECOVERY to control that we won't remove .ready files by
default until the startup process sees fit to do the actual switch
depending on the checkpoint record lookup, if archive recovery was
actually requested, etc.

> Indeed, Benoît Lobréau reported this behavior to me.

Noted. Thanks for the information. I don't think that I have ever
met Benoît in person, do I? Tell him that I owe him one beer or a
beverage of his choice when we meet IRL, and that he had better use
this message-id to make me keep my promise :)
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-21 03:09:25 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Bruce Momjian 2020-04-21 00:57:16 Re: BUG #16361: [pgadmin-v4.20] Empty error message when you try to generate table create script

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-21 02:25:05 Re: 001_rep_changes.pl stalls
Previous Message Thomas Munro 2020-04-21 02:05:18 Re: fixing old_snapshot_threshold's time->xid mapping