Re: Understanding, testing and improving our Windows filesystem code

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Understanding, testing and improving our Windows filesystem code
Date: 2022-10-20 04:54:47
Message-ID: CA+hUKG+r07Zy6AHroUDZm9k743a_y3r-puTnhKEM2zuFQymYkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 18, 2022 at 10:00 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> * has anyone got a relevant filesystem where this fails? which way
> do ReFS and SMB go? do the new calls in 0010 just fail, and if so
> with which code (ie could we add our own fallback path)?

Andres kindly ran these tests on some Win 10 and Win 11 VMs he had
with non-NTFS filesystems, so I can report:

NTFS: have_posix_unlink_semantics == true, tests passing

ReFS: have_posix_unlink_semantics == false, tests passing

SMB: have_posix_unlink_semantics == false, symlink related tests
failing (our junction points are rejected) + one readdir() test
failing (semantic difference introduced by SMB, it can't see
STATUS_DELETE_PENDING zombies).

I think this means that PostgreSQL probably mostly works on SMB today,
except you can't create tablespaces, and therefore our regression
tests etc already can't pass there, and there may be a few extra
ENOTEMPTY race conditions due to readdir()'s different behaviour.

> * if there are any filesystems that don't support POSIX-semantics,
> would we want to either (1) get such a thing into the build farm so
> it's tested or (2) de-support non-POSIX-semantics filesystems by
> edict, and drop a lot of code and problems that everyone hates?

Yes, yes there are, so this question comes up. Put another way:

I guess that almost all users of PostgreSQL on Windows are using NTFS.
Some are getting partial POSIX semantics already, and some are not,
depending on the Windows variant. If we commit the 0010 patch, all
supported OSes will get full POSIX unlink semantics on NTFS. That'd
leave just ReFS and SMB users (are there any other relevant
filesystems?) in the cold with non-POSIX semantics. Do we want to
claim that we support those filesystems? If so, I guess we'd need an
animal and perhaps also optional CI with ReFS. (Though ReFS may
eventually get POSIX semantics too, I have no idea about that.) If
not, we could in theory rip out various code we have to cope with the
non-POSIX unlink semantics, and completely forget about that whole
category of problem.

Changes in this version:
* try to avoid tests that do bad things that crash if earlier tests
failed (I learned that close(-1) aborts in debug builds)
* add fallback paths in 0010 (I learned what errors are raised on lack
of POSIX support)
* fix MinGW build problems

As far as I could tell, MinGW doesn't have a struct definition we
need, and it seems to want _WIN32_WINNT >= 0x0A000002 to see
FileRenameInfoEx, which looks weird to me... (I'm not sure about that,
but I think that was perhaps supposed to be 0x0A02, but even that
isn't necessary with MSVC SDK headers). I gave up researching that
and put the definitions I needed into the code.

Attachment Content-Type Size
v2-0001-Add-suite-of-macros-for-writing-TAP-tests-in-C.patch text/x-patch 4.8 KB
v2-0002-meson-Add-infrastructure-for-TAP-tests-written-in.patch text/x-patch 4.5 KB
v2-0003-Fix-symlink-errno-in-Windows-replacement-code.patch text/x-patch 1.3 KB
v2-0004-Fix-readlink-return-value-on-Windows.patch text/x-patch 795 bytes
v2-0005-Add-tests-for-Windows-filesystem-code-in-src-port.patch text/x-patch 32.1 KB
v2-0006-Fix-lstat-on-broken-junction-points.patch text/x-patch 5.6 KB
v2-0007-Fix-readlink-for-non-PostgreSQL-created-junction-.patch text/x-patch 4.5 KB
v2-0008-Fix-stat-for-recursive-junction-points-on-Windows.patch text/x-patch 4.9 KB
v2-0009-Fix-unlink-for-STATUS_DELETE_PENDING-on-Windows.patch text/x-patch 5.2 KB
v2-0010-Use-POSIX-semantics-for-unlink-and-rename-on-Wind.patch text/x-patch 10.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-10-20 05:37:24 Re: Logical WAL sender unresponsive during decoding commit
Previous Message Justin Pryzby 2022-10-20 04:15:59 Re: GUC values - recommended way to declare the C variables?