Skip site navigation (1) Skip section navigation (2)

Re: O_DIRECT for WAL writes

From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: O_DIRECT for WAL writes
Date: 2005-05-30 08:04:48
Message-ID: 429AC920.6080809@cheapcomplexdevices.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
>>is opening a file with O_DIRECT sufficient to ensure that
>>a write(2) does not return until the data has hit disk?
> 
> Some googling suggests so, eg
> http://www.die.net/doc/linux/man/man2/open.2.html

Really?  On that page I read:
  "O_DIRECT...at the completion of the read(2) or write(2)
   system call, data is guaranteed to have been transferred."
which sounds to me like transfered to the device's cache
but not necessarily flushed through the device's cache.
It says nothing about physical media.  That wording feels
different to me from O_SYNC which reads:
  "O_SYNC will block the calling process until the data has
   been physically written to the underlying hardware."
which does suggest to me that it writes to physical media.
Or am I reading that wrong?



PS: I've gotten way out of my depth here, but...

     ...attempting to browse the Linux source(!!)

   Looking at the O_SYNC stuff in ext3:
       http://lxr.linux.no/source/fs/ext3/file.c#L67
   it looks like in this conditional:
    if (file->f_flags & O_SYNC) {
       ...
       goto force_commit;
    }
   the goto branch calls ext3_force_commit() in much the
   same way that it seems fsync() does here:
       http://lxr.linux.no/source/fs/ext3/fsync.c#L71
   so I believe O_SYNC does at least as much as fsync().

   However I can't find O_DIRECT anywhere in the ext3 stuff,
   so if it does work it's less obvious how or if it could.

   Moreover I see O_SYNC used lots of places:
       http://lxr.linux.no/ident?i=O_SYNC
   in various places like fs/ext3/; and and I don't
   see O_DIRECT in nearly as many places
       http://lxr.linux.no/ident?i=O_DIRECT
   It looks like reiserfs and xfs seem look at O_DIRECT,
   but ext3 doesn't appear to unless it's somewhere
   outside the fs/ext3 directory.


PPS: Of course not even fsync() flushed correctly until very recent kernels:
     http://hardware.slashdot.org/comments.pl?sid=149349&cid=12519114
     In that article Jeff Garzik (the linux SATA driver guy) suggests
     that until very recent kernels ext3 did not have write barrier
     support that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE
     (SCSI) commands even on fsync.


PPPS: No, I don't understand the kernel - I'm just showing what quick
       grep commands showed without any deep understanding.

In response to

pgsql-hackers by date

Next:From: Hannu KrosingDate: 2005-05-30 09:21:41
Subject: Re: compiling postgres with Visual Age compiler on
Previous:From: Zeugswetter Andreas DAZ SDDate: 2005-05-30 08:01:56
Subject: Re: compiling postgres with Visual Age compiler on OpenPower5 / Linux

pgsql-patches by date

Next:From: Peter EisentrautDate: 2005-05-30 09:26:44
Subject: Re: Escape handling in COPY, strings, psql
Previous:From: chasidy hunterDate: 2005-05-30 07:43:52
Subject: male performance system

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group