Re: avoid multiple hard links to same WAL file after a crash

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-08 17:05:19
Message-ID: 20220408170519.GA1409463@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 08, 2022 at 09:53:12AM -0700, Nathan Bossart wrote:
> On Fri, Apr 08, 2022 at 10:38:03AM -0400, Robert Haas wrote:
>> I'd actually be in favor of nuking durable_rename_excl() from orbit
>> and putting the file-exists tests in the callers. Otherwise, someone
>> might assume that it actually has the semantics that its name
>> suggests, which could be pretty disastrous. If we don't want to do
>> that, then I'd changing to do the stat-then-durable-rename thing
>> internally, so we don't leave hard links lying around in *any* code
>> path. Perhaps that's the right answer for the back-branches in any
>> case, since there could be third-party code calling this function.
> I think there might be another problem. The man page for rename() seems to
> indicate that overwriting an existing file also introduces a window where
> the old and new path are hard links to the same file. This isn't a problem
> for the WAL files because we should never be overwriting an existing one,
> but I wonder if it's a problem for other code paths. My guess is that many
> code paths that overwrite an existing file are first writing changes to a
> temporary file before atomically replacing the original. Those paths are
> likely okay, too, as you can usually just discard any existing temporary
> files.

Ha, so there are only a few callers of durable_rename_excl() in the
PostgreSQL tree. One is basic_archive.c, which is already doing a stat()
check. IIRC I only used durable_rename_excl() here to handle the case
where multiple servers are writing archives to the same location. If that
happened, the archiver process would begin failing. If a crash left two
hard links to the same file around, we will silently succeed the next time
around thanks to the compare_files() check. Besides the WAL installation
code, the only other callers are in timeline.c, and both note that the use
of durable_rename_excl() is for "paranoidly trying to avoid overwriting an
existing file (there shouldn't be one)."

So AFAICT basic_archive.c is the only caller with a strong reason for using
durable_rename_excl(), and even that might not be worth keeping it around.

Nathan Bossart
Amazon Web Services:

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2022-04-08 17:34:00 Re: trigger example for plsample
Previous Message Joshua Brindle 2022-04-08 17:04:22 Re: [PoC/RFC] Multiple passwords, interval expirations