Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath

From: Kevin K Biju <kevinkbiju(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Replace O_EXCL with O_TRUNC for creation of state.tmp in SaveSlotToPath
Date: 2025-10-15 14:00:39
Message-ID: CAM45KeGUXC1+P2TZVTR5WDoYKjppUoTf50V1wVUU2ZgHOgvyeg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

Thanks for the context. I missed the earlier discussions about the same
issue. Using unlink in the error paths makes sense to me. There is an edge
case in my mind, in case unlink fails as well, and we end up in the same
condition; however, the chance of that occurring is sufficiently low.

It looks like this change was already committed in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0adf424b490b701ae8eeeca3d9630773f18cd937.
Thanks for the quick response!

Kevin

On Fri, Oct 10, 2025 at 6:22 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Oct 09, 2025 at 05:02:12PM +0900, Michael Paquier wrote:
> > An alternative fix that we can do here instead is to unlink() the
> > temporary file when reaching on these error code paths, allowing
> > future accesses to work correctly. This was suggested as a second
> > solution, other than the O_TRUNC objected to. One thing is to make
> > sure that the unlinks are done while holding the lwlock for the IO in
> > progress. So, something like the attached should also solve your
> > problem.
>
> I have been playing a bit more with that, and applied 912af1c7e9c9 to
> do the unlink() calls down to v13. While on it, I have also played
> with hard crashes timed while we are in the middle of SaveSlotToPath()
> with state.tmp still around at restart (injection point wait just
> before the rename(), for example), and double-checked that recovery is
> able to do a correct cleanup job. I didn't fully recall this last
> part, as it has been a couple of years since the last report.
>
> There may be an argument for having an automated test, like:
> - Physical slot creation.
> - Use a bit the slot.
> - Injection point wait before the rename() of SaveSlotToPath().
> - Checkpoint, that would not be able to finish.
> - Hard crash.
> - Restart, check that the slot state is correct after recovery.
>
> However, I am not sure that this would be worth the cycles spent on.
> There are a lot more interesting scenarios for write failures in the
> tree than this one.
> --
> Michael
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-10-15 14:05:29 Re: Reorganize GUC structs
Previous Message Tom Lane 2025-10-15 13:58:33 Re: remove pg_restrict workaround