Re: InstallXLogFileSegment() vs concurrent WAL flush

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: InstallXLogFileSegment() vs concurrent WAL flush
Date: 2024-02-02 11:56:08
Message-ID: 20240202205608.7e8a6a9ed08386d218d73704@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2 Feb 2024 11:18:18 +0100
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> Hi,
>
> New WAL space is created by renaming a file into place. Either a
> newly created file with a temporary name or, ideally, a recyclable old
> file with a name derived from an old LSN. I think there is a data
> loss window between rename() and fsync(parent_directory). A
> concurrent backend might open(new_name), write(), fdatasync(), and
> then we might lose power before the rename hits the disk. The data
> itself would survive the crash, but recovery wouldn't be able to find
> and replay it. That might break the log-before-data rule or forget a
> transaction that has been reported as committed to a client.
>
> Actual breakage would presumably require really bad luck, and I
> haven't seen this happen or anything, it just occurred to me while
> reading code, and I can't see any existing defences.
>
> One simple way to address that would be to make XLogFileInitInternal()
> wait for InstallXLogFileSegment() to finish. It's a little

Or, can we make sure the rename is durable by calling fsync before
returning the fd, as a patch attached here?

Regards,
Yugo Nagata

> pessimistic to do that unconditionally, though, as then you have to
> wait even for rename operations for segment files later than the one
> you're opening, so I thought about how to skip waiting in that case --
> see 0002. I'm not sure if it's worth worrying about or not.

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
fix_InstallXLogFileSegment_in_another_way.patch text/x-diff 535 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-02-02 12:11:40 Re: Checking MINIMUM_VERSION_FOR_WAL_SUMMARIES
Previous Message John Naylor 2024-02-02 11:47:02 Re: [PoC] Improve dead tuple storage for lazy vacuum