Re: recovery starting when backup_label exists, but not recovery.signal

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovery starting when backup_label exists, but not recovery.signal
Date: 2019-09-27 07:02:00
Message-ID: CAD21AoD-Pp7+hjKcKT5jZ10kV_53_Zw18oaOVEYYu_bdNLx5kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 27, 2019 at 3:36 AM David Steele <david(at)pgmasters(dot)net> wrote:
>
> On 9/24/19 1:25 AM, Fujii Masao wrote:
> >
> > When backup_label exists, the startup process enters archive recovery mode
> > even if recovery.signal file doesn't exist. In this case, the startup process
> > tries to retrieve WAL files by using restore_command. Then, at the beginning
> > of the archive recovery, the contents of backup_label are copied to pg_control
> > and backup_label file is removed. This would be an intentional behavior.
>
> > But I think the problem is that, if the server shuts down during that
> > archive recovery, the restart of the server may cause the recovery to fail
> > because neither backup_label nor recovery.signal exist and the server
> > doesn't enter an archive recovery mode. Is this intentional, too? Seems No.
> >
> > So the problematic scenario is;
> >
> > 1. the server starts with backup_label, but not recovery.signal.
> > 2. the startup process enters an archive recovery mode because
> > backup_label exists.
> > 3. the contents of backup_label are copied to pg_control and
> > backup_label is deleted.
>
> Do you mean deleted or renamed to backup_label.old?
>
> > 4. the server shuts down..
>
> This happens after the cluster has reached consistency?
>
> > 5. the server is restarted. neither backup_label nor recovery.signal exist.
> > 6. the startup process starts just crash recovery because neither backup_label
> > nor recovery.signal exist. Since it cannot retrieve WAL files from archival
> > area, it may fail.
>
> I tried a few ways to reproduce this but was not successful without
> manually removing WAL.

Hmm me too. I think that since we enter crash recovery at step #6 we
don't retrieve WAL files from archival area.

But I reproduced the problem Fujii-san mentioned that the restart of
the server during archive recovery causes to the crash recovery
instead of resuming the archive recovery. Which is the different
behavior from version 11 or before and I personally think it made
behavior worse.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asim R P 2019-09-27 07:03:47 Re: Batch insert in CTAS/MatView code
Previous Message Michael Paquier 2019-09-27 06:14:14 Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node