InstallXLogFileSegment() vs concurrent WAL flush

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: InstallXLogFileSegment() vs concurrent WAL flush
Date: 2024-02-02 10:18:18
Message-ID: CA+hUKGLO02j2WLiQ73iZ+CEY1G+LPmHo3PXaYTaFY9Hj222mEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

New WAL space is created by renaming a file into place. Either a
newly created file with a temporary name or, ideally, a recyclable old
file with a name derived from an old LSN. I think there is a data
loss window between rename() and fsync(parent_directory). A
concurrent backend might open(new_name), write(), fdatasync(), and
then we might lose power before the rename hits the disk. The data
itself would survive the crash, but recovery wouldn't be able to find
and replay it. That might break the log-before-data rule or forget a
transaction that has been reported as committed to a client.

Actual breakage would presumably require really bad luck, and I
haven't seen this happen or anything, it just occurred to me while
reading code, and I can't see any existing defences.

One simple way to address that would be to make XLogFileInitInternal()
wait for InstallXLogFileSegment() to finish. It's a little
pessimistic to do that unconditionally, though, as then you have to
wait even for rename operations for segment files later than the one
you're opening, so I thought about how to skip waiting in that case --
see 0002. I'm not sure if it's worth worrying about or not.

Attachment Content-Type Size
0001-Fix-InstallXLogFileSegment-concurrency-bug.patch application/octet-stream 1.6 KB
0002-Track-end-of-installed-WAL-space-in-shared-memory.patch application/octet-stream 2.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-02 10:39:25 Re: An improvement on parallel DISTINCT
Previous Message Bertrand Drouvot 2024-02-02 10:15:13 Re: Synchronizing slots from primary to standby