Re: stat() vs ERROR_DELETE_PENDING, round N + 1

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: stat() vs ERROR_DELETE_PENDING, round N + 1
Date: 2021-09-05 21:44:37
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2021-09-06 01:32:55 +1200, Thomas Munro wrote:
> It's a good idea. I tested it and it certainly does fix the
> basebackup problem I've seen (experimental patch attached). But,
> yeah, I'm also a bit worried that that path could be fragile and need
> special handling in lots of places.

It's also expensive-ish.

> I also tried writing a new open() wrapper using the lower level
> NtCreateFile() interface, and then an updated stat() wrapper built on
> top of that. As a non-Windows person, getting that to (mostly) work
> involved a fair amount of suffering. I can share that if someone is
> interested, but while learning about that family of interfaces, I
> realised we could keep the existing Win32-based code, but also
> retrieve the NT status, leading to a very small change (experimental
> patch attached).

Is it guaranteed, or at least reliable, that the status we fetch with
RtlGetLastNtStatus is actually from the underlying filesystem operation,
rather than some other work that happens during the win32->nt translation?
E.g. a memory allocation or such? Presumably most of such work happens before
the actual nt "syscall", but ...

> The best idea is probably to set FILE_DISPOSITION_DELETE |
> FILE_DISPOSITION_POSIX_SEMANTICS before unlinking. This appears to be
> a silver bullet, but isn't available on ancient Windows releases that
> we support, or file systems other than local NTFS. So maybe we need a
> combination of that + STATUS_DELETE_PENDING as shown in the attached.
> I'll look into that next.

When was that introduced?

I'd be ok to only fix these bugs on e.g. Win10, Win Server 2019, Win Server
2016 or such. I don't think we need to support OSs that the vendor doesn't
support - and I wouldn't count "only security fixes" as support in this
main extended
Windows 10 Oct 14, 2025
Windows Server 2019 Jan 9, 2024 Jan 9, 2029
Windows Server 2016 Jan 11, 2022 Jan 12, 2027
Windows 7 Jan 13, 2015 Jan 14, 2020
Windows Vista Apr 10, 2012 Apr 11, 2017

One absurd detail here is that the deault behaviour changed sometime in
Windows 10's lifetime:

"The behavior changed in recent releases of Windows 10 -- without notice
AFAIK. DeleteFileW now tries to use POSIX semantics if the filesystem supports
it. NTFS does."

> #ifndef FRONTEND
> - Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */
> + /* XXX When called by stat very early on, this fails! */
> + //Assert(pgwin32_signal_event != NULL); /* small chance of pg_usleep() */
> #endif

Perhaps we should move the win32 signal initialization into StartupHacks()?
There's some tension around it using ereport(), and MemoryContextInit() only
being called a tad later, but that seems resolvable.

> + * Our open wrapper will report STATUS_DELETE_PENDING as ENOENT. We pass
> + * in a special private flag to say that it's _pgstat64() calling, to
> + * activate a mode that allows directories to be opened for limited
> + * purposes.
> + *
> + * XXX Think about fd pressure, since we're opening an fd?
> */

If I understand
etc correctly, it looks like there is. But only at the point we do
_open_osfhandle(). So perhaps we should a pgwin32_open() version returning a
handle and make pgwin32_open() a wrapper around that?


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-05 21:55:00 Re: stat() vs ERROR_DELETE_PENDING, round N + 1
Previous Message Andres Freund 2021-09-05 20:28:00 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints