Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery

From: Thomas Crayford <tcrayford(at)salesforce(dot)com>
To: michael(at)paquier(dot)xyz
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: "could not open file "pg_wal/…": No such file or directory" potential crashing bug due to race condition between restartpoint and recovery
Date: 2018-09-28 12:02:42
Message-ID: CAJgZ2Z4w=75AX6--Uumq2VMzmCJm_xTKZWf9ei58f08NEkjnyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi there,

Ok, thanks for the pointer. It seems like the race condition I talked about
is still accurate, does that seem right?

Thanks

Tom

On Mon, Sep 24, 2018 at 4:37 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Mon, Sep 24, 2018 at 12:58:59PM +0100, Thomas Crayford wrote:
> > May 20 09:56:14 redacted[9]: [2468859-1] sql_error_code = 00000 LOG:
> > restored log file "00000002000072B50000003A" from archive
> > May 20 09:56:14 ip-10-0-92-26 redacted[141]: [191806-1] sql_error_code =
> > 58P01 ERROR: could not open file "pg_wal/00000002000072B50000003A": No
> such
> > file or directory
>
> What kind of restore_command is used here?
>
> > Looking at the code, I think that the two racing functions are
> > RestoreArchivedFile, and CreateRestartPoint.
> >
> > The former calls unlink on the wal segment, CreateRestartPoint does
> attempt
> > to do recycling on segments.
>
> Don't you mean KeepFileRestoredFromArchive()? RestoreArchivedFile would
> call unlink() on pg_wal/RECOVERYXLOG so that does not match.
> --
> Michael
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-09-28 12:12:15 BUG #15409: Error reading file C:/Program Files (x86) /PostgreSQL/8.3/data/postgresql.conf
Previous Message TAKATSUKA Haruka 2018-09-28 03:57:37 Re: BUG #15402: Hot standby server with archive_mode=on keeps initial WAL segments