Re: Race between KeepFileRestoredFromArchive() and restartpoint

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: don(at)seiler(dot)us, david(at)pgmasters(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race between KeepFileRestoredFromArchive() and restartpoint
Date: 2022-08-03 07:28:47
Message-ID: 20220803072847.GB3817792@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 03, 2022 at 11:24:17AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 2 Aug 2022 16:03:42 -0500, Don Seiler <don(at)seiler(dot)us> wrote in
> > could not link file “pg_wal/xlogtemp.18799" to
> > > “pg_wal/000000010000D45300000010”: File exists

> Hmm. It seems like a race condition betwen StartupXLOG() and
> RemoveXlogFIle(). We need wider extent of ContolFileLock. Concretely
> taking ControlFileLock before deciding the target xlog file name in
> RemoveXlogFile() seems to prevent this happening. (If this is correct
> this is a live issue on the master branch.)

RemoveXlogFile() calls InstallXLogFileSegment() with find_free=true. The
intent of find_free=true is to make it okay to pass a target xlog file that
ceases to be a good target. (InstallXLogFileSegment() searches for a good
target while holding ControlFileLock.) Can you say more about how that proved
to be insufficient?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-08-03 07:32:17 Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
Previous Message Ronan Dunklau 2022-08-03 07:26:32 Fix gin index cost estimation