Re: avoid multiple hard links to same WAL file after a crash

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-27 18:42:04
Message-ID: 20220427184204.GB3222843@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 27, 2022 at 04:09:20PM +0900, Michael Paquier wrote:
> I am not sure that have any need to backpatch this change based on the
> unlikeliness of the problem, TBH. One thing that is itching me a bit,
> like Robert upthread, is that we don't check anymore that the newfile
> does not exist in the code paths because we never expect one. It is
> possible to use stat() for that. But access() within a simple
> assertion would be simpler? Say something like:
> Assert(access(path, F_OK) != 0 && errno == ENOENT);
> The case for basic_archive is limited as the comment of the patch
> states, but that would be helpful for the two calls in timeline.c and
> the one in xlog.c in the long-term. And this has no need to be part
> of fd.c, this can be added before the durable_rename() calls. What do
> you think?

Here is a new patch set with these assertions added. I think at least the
xlog.c change ought to be back-patched. The problem may be unlikely, but
AFAICT the possible consequences include WAL corruption.

Nathan Bossart
Amazon Web Services:

Attachment Content-Type Size
v4-0001-Replace-calls-to-durable_rename_excl-with-durable.patch text/x-diff 4.8 KB
v4-0002-Remove-durable_rename_excl.patch text/x-diff 4.0 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-04-27 19:56:27 Re: [RFC] building postgres with meson -v8
Previous Message Nathan Bossart 2022-04-27 18:09:45 Re: Possible corruption by CreateRestartPoint at promotion