Re: avoid multiple hard links to same WAL file after a crash

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-08 14:38:03
Message-ID: CA+TgmoZ5VHuv_BzpH7jrgyLu1Kymo=Xom5xFOeGf6xXP9BLhkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 2:30 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Presently, WAL recycling uses durable_rename_excl(), which notes that a
> crash at an unfortunate moment can result in two links to the same file.
> My testing [1] demonstrated that it was possible to end up with two links
> to the same file in pg_wal after a crash just before unlink() during WAL
> recycling. Specifically, the test produced links to the same file for the
> current WAL file and the next one because the half-recycled WAL file was
> re-recycled upon restarting. This seems likely to lead to WAL corruption.

Wow, that's bad.

> The attached patch prevents this problem by using durable_rename() instead
> of durable_rename_excl() for WAL recycling. This removes the protection
> against accidentally overwriting an existing WAL file, but there shouldn't
> be one.

I see that durable_rename_excl() has the following comment: "Similar
to durable_rename(), except that this routine tries (but does not
guarantee) not to overwrite the target file." If those are the desired
semantics, we could achieve them more simply and more safely by just
trying to stat() the target file and then, if it's not found, call
durable_rename(). I think that would be a heck of a lot safer than
what this function is doing right now.

I'd actually be in favor of nuking durable_rename_excl() from orbit
and putting the file-exists tests in the callers. Otherwise, someone
might assume that it actually has the semantics that its name
suggests, which could be pretty disastrous. If we don't want to do
that, then I'd changing to do the stat-then-durable-rename thing
internally, so we don't leave hard links lying around in *any* code
path. Perhaps that's the right answer for the back-branches in any
case, since there could be third-party code calling this function.

Your proposed fix is OK if we don't want to do any of that stuff, but
personally I'm much more inclined to blame durable_rename_excl() for
being horrible than I am to blame the calling code for using it
improvidently.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2022-04-08 14:44:51 Re: [COMMITTERS] pgsql: Allow time delayed standbys and recovery
Previous Message Amul Sul 2022-04-08 14:27:03 Re: [Patch] ALTER SYSTEM READ ONLY