Re: Final(?) proposal for wal_sync_method changes

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final(?) proposal for wal_sync_method changes
Date: 2010-12-08 06:48:56
Message-ID: AANLkTikGVPfa9KTGdGhEGarw3-2gM8vrdRmBTTTRhfmj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 8, 2010 at 02:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>> I am unclear as to the reason why there is a test for
>>>> HAVE_FSYNC_WRITETHROUGH_ONLY in pg_fsync().  Perhaps that is also
>>>> leftover from a previous vision of how this all works?  Or does an
>>>> fsync() call actually fail on Windows?
>
>>> No, fsync responds fine.  It just don't actually sync to disk.

First of all a warning - I'm writing this on way too little sleep :-)
Blame pgday.eu...

> Sigh ... The closer I look at the Windows code path here, the more of an
> inconsistent, badly documented spaghetti-heap it appears to be.  So far
> as a quick Google search unearths, there is no fsync() primitive on
> Windows.  What we have actually got is this gem in port/win32.h:

Correct.

> /*
>  *      Even though we don't support 'fsync' as a wal_sync_method,
>  *      we do fsync() a few other places where _commit() is just fine.
>  */
> #define fsync(fd) _commit(fd)
>
> So actually, there is no difference between selecting fsync and
> fsync_writethrough on Windows, this comment and the SGML documentation
> to the contrary.  Both settings result in invoking _commit() and
> presumably are safe.  One wonders why we bothered to invent a separate
> fsync_writethrough setting on Windows.

IIRC, using _commit(fd) *is* fsync_writethrough. That's what we
shipped with. It even writes through the cache on a RAID controller
that has BBU'ed write-cache. We had to implement the *other* options
in order to "lower" the safety (it doesn't actually lower the safety
*if* you have a BBU, which is a very good use-case for those options)

> What this means is that switching to a simple preference order
> "fdatasync, then fsync" will result in choosing fsync on Windows (since
> it hasn't got fdatasync), meaning _commit, meaning Windows users see
> a behavioral change after all.

_commit() != fsync()

I think this is the discussion and subsequent changes:

http://archives.postgresql.org/pgsql-patches/2005-03/msg00230.php

> Would someone verify via pgbench or similar test (*not* test_fsync) that
> on Windows, wal_sync_method = fsync or fsync_writethrough perform the
> same (ie tps ~= disk rotation rate) while open_datasync is too fast to
> be real?  I'm losing confidence that I've found all the spaghetti ends
> here, and I don't have a Windows setup to try it myself.

Please note that if you're re-verifying this, verify both on crappy
disk *and* on a proper BBU'ed RAID-controller. The reason for this
originally was that we performed about the same in those two, wihch
made no sense...

Merlin, IIRC you did a lot of the testing around this - do you recall
any more details?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-12-08 07:59:27 Re: To Signal The postmaster
Previous Message Greg Smith 2010-12-08 06:02:37 Re: Final(?) proposal for wal_sync_method changes