Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Crayford <tcrayford(at)salesforce(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery
Date: 2018-09-24 15:37:05
Message-ID: 20180924153705.GB1103@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Sep 24, 2018 at 12:58:59PM +0100, Thomas Crayford wrote:
> May 20 09:56:14 redacted[9]: [2468859-1] sql_error_code = 00000 LOG:
> restored log file "00000002000072B50000003A" from archive
> May 20 09:56:14 ip-10-0-92-26 redacted[141]: [191806-1] sql_error_code =
> 58P01 ERROR: could not open file "pg_wal/00000002000072B50000003A": No such
> file or directory

What kind of restore_command is used here?

> Looking at the code, I think that the two racing functions are
> RestoreArchivedFile, and CreateRestartPoint.
>
> The former calls unlink on the wal segment, CreateRestartPoint does attempt
> to do recycling on segments.

Don't you mean KeepFileRestoredFromArchive()? RestoreArchivedFile would
call unlink() on pg_wal/RECOVERYXLOG so that does not match.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-09-24 15:52:59 Re: BUG #15397: perl error
Previous Message PG Bug reporting form 2018-09-24 14:53:18 BUG #15397: perl error