Re: Windows now has fdatasync()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Dave Page <dpage(at)pgadmin(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Windows now has fdatasync()
Date: 2022-08-10 01:37:16
Message-ID: CA+hUKG+a-7r4GpADsasCnuDBiqC1c31DAQQco2FayVtB9V3sQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David kindly ran some tests of this thing on real hardware. The
results were mostly in line with expectations, but we learned some new
things.

TL;DR We probably should consider this as a safer default, but it'd
be good for someone more hands-on with this OS and knowledgeable about
storage to investigate and propose that. My original goal here was
primarily Unix/Windows harmonisation and cleanup since I'm doing a
bunch of hacking on I/O, but I can't unsee an
unsafe-at-least-on-consumer-gear default now that I've seen it. The
main thing I'm aware of that we don't know yet is what happens if you
try it on a non-NTFS file system (ReFS? SMB?) -- hopefully it falls
back to fsync behaviour.

Observations from an old Windows 8.1 system with a SATA drive:

1. So far you can apparently still actually compile and run on 8.1,
despite recent commits to de-support it.
2. You can use the new wal_sync_method=fdatasync, without error, and
timings are consistent with falling back to full fsync behaviour.
That makes sense, I guess, because the function existed. It's just a
new flag bit, and the default behaviour for flags == 0 was already
their fsync. That seems like a good outcome even though 8.1 isn't a
target anymore.

Observations from a current Windows 11 system with an NVMe drive:

1. fdatasync is faster than fsync, as expected. Twice as fast with
write cache disabled, a bit faster with write cache enabled.
2. Timings seem to suggest that open_datasync (the current default)
is not really writing through the drive cache. I'd previously thought
that was a SATA-only problem based on [1], which said that EIDE/SATA
drivers did not pass through the FUA flag that NTFS sends for
FILE_FLAG_WRITE_THROUGH (= O_DSYNC) on the basis that many drives
ignored it anyway, but these numbers seem to suggest that David's
recent-ish NVMe system has the same problem as the old SATA system.

Generally, Windows' approach seems to be that NTFS
FILE_FLAG_WRITE_THROUGH fires an FUA flag into the storage stack, and
either the driver or the drive is free to fling it out the window, and
it's the user's problem to worry about that, whereas Linux at least
asks nicely if the drive understands FUA and falls back to flushing
the whole cache if not[2]. I also know that Linux has been flaky
around this in the past too, especially on consumer storage, and macOS
and at least some of the older BSD/UFS systems just don't do this
stuff at all for user data (yet) so it's not like there is anything
universal about this topic. Note that drive caches are enabled by
default in Windows, and our manual does already tell you about this
problem[3].

One thing to note about the numbers below: pg_test_fsync.c's
open_datasync test is also using FILE_FLAG_NO_BUFFERING (= O_DIRECT),
unlike PostgreSQL, which muddies the waters slightly. (There was a
patch upthread to fix that and report both numbers, I may come back to
that.)

Windows 11, NVMe, write cache enabled:

open_datasync 27306.286 ops/sec 37 usecs/op
fdatasync 3065.428 ops/sec 326 usecs/op
fsync 2577.498 ops/sec 388 usecs/op

Windows 11, NVMe, write cache disabled:

open_datasync 3477.258 ops/sec 288 usecs/op
fdatasync 3263.418 ops/sec 306 usecs/op
fsync 1641.502 ops/sec 609 usecs/op

Windows 8.1, SATA:

open_datasync 19934.532 ops/sec 50 usecs/op
fdatasync 231.429 ops/sec 4321 usecs/op
fsync 240.050 ops/sec 4166 usecs/op

(We couldn't figure out how to disable the write cache on the 8.1
machine -- the usual checkbox had no effect -- but we didn't waste
time investigating that old system beyond the curiosity of checking if
it'd work at all.)

[1] https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[2] https://techcommunity.microsoft.com/t5/sql-server-blog/sql-server-on-linux-forced-unit-access-fua-internals/ba-p/3199102
[3] https://www.postgresql.org/docs/devel/wal-reliability.html

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2022-08-10 02:11:21 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Lukas Fittl 2022-08-10 01:07:30 Re: pg_get_constraintdef: Schema qualify foreign tables unless pretty printing is enabled