Re: avoid multiple hard links to same WAL file after a crash

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-09 01:00:36
Message-ID: CA+Tgmobjke2TBrYk=Zmo=zawkG9A4saAwbe-O5wLHqY9Bz1Fag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 8, 2022 at 12:53 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> On Fri, Apr 08, 2022 at 10:38:03AM -0400, Robert Haas wrote:
> > I see that durable_rename_excl() has the following comment: "Similar
> > to durable_rename(), except that this routine tries (but does not
> > guarantee) not to overwrite the target file." If those are the desired
> > semantics, we could achieve them more simply and more safely by just
> > trying to stat() the target file and then, if it's not found, call
> > durable_rename(). I think that would be a heck of a lot safer than
> > what this function is doing right now.
>
> IIUC it actually does guarantee that you won't overwrite the target file
> when HAVE_WORKING_LINK is defined. If not, it provides no guarantees at
> all. Using stat() before rename() would therefore weaken this check for
> systems with working link(), but it'd probably strengthen it for systems
> without a working link().

Sure, but a guarantee that happens on only some systems isn't worth
much. And, if it comes at the price of potentially having multiple
hard links to the same file in obscure situations, that seems like it
could easily cause more problems than this whole scheme can ever hope
to solve.

> I think there might be another problem. The man page for rename() seems to
> indicate that overwriting an existing file also introduces a window where
> the old and new path are hard links to the same file. This isn't a problem
> for the WAL files because we should never be overwriting an existing one,
> but I wonder if it's a problem for other code paths. My guess is that many
> code paths that overwrite an existing file are first writing changes to a
> temporary file before atomically replacing the original. Those paths are
> likely okay, too, as you can usually just discard any existing temporary
> files.

I wonder if this is really true. I thought rename() was supposed to be atomic.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-09 01:02:41 Re: Mingw task for Cirrus CI
Previous Message Andres Freund 2022-04-09 00:59:10 Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman