Re: avoid multiple hard links to same WAL file after a crash

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-08 16:53:12
Message-ID: 20220408165312.GB1409034@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 08, 2022 at 10:38:03AM -0400, Robert Haas wrote:
> I see that durable_rename_excl() has the following comment: "Similar
> to durable_rename(), except that this routine tries (but does not
> guarantee) not to overwrite the target file." If those are the desired
> semantics, we could achieve them more simply and more safely by just
> trying to stat() the target file and then, if it's not found, call
> durable_rename(). I think that would be a heck of a lot safer than
> what this function is doing right now.

IIUC it actually does guarantee that you won't overwrite the target file
when HAVE_WORKING_LINK is defined. If not, it provides no guarantees at
all. Using stat() before rename() would therefore weaken this check for
systems with working link(), but it'd probably strengthen it for systems
without a working link().

> I'd actually be in favor of nuking durable_rename_excl() from orbit
> and putting the file-exists tests in the callers. Otherwise, someone
> might assume that it actually has the semantics that its name
> suggests, which could be pretty disastrous. If we don't want to do
> that, then I'd changing to do the stat-then-durable-rename thing
> internally, so we don't leave hard links lying around in *any* code
> path. Perhaps that's the right answer for the back-branches in any
> case, since there could be third-party code calling this function.

I think there might be another problem. The man page for rename() seems to
indicate that overwriting an existing file also introduces a window where
the old and new path are hard links to the same file. This isn't a problem
for the WAL files because we should never be overwriting an existing one,
but I wonder if it's a problem for other code paths. My guess is that many
code paths that overwrite an existing file are first writing changes to a
temporary file before atomically replacing the original. Those paths are
likely okay, too, as you can usually just discard any existing temporary
files.

> Your proposed fix is OK if we don't want to do any of that stuff, but
> personally I'm much more inclined to blame durable_rename_excl() for
> being horrible than I am to blame the calling code for using it
> improvidently.

I do agree that it's worth examining this stuff a bit closer. I've
frequently found myself trying to reason about all the different states
that callers of these functions can produce, so any changes that help
simplify matters are a win in my book.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2022-04-08 16:54:55 Improving btree performance through specializing by key shape, take 2
Previous Message SATYANARAYANA NARLAPURAM 2022-04-08 16:52:23 Re: How to simulate sync/async standbys being closer/farther (network distance) to primary in core postgres?