Re: Use of O_DIRECT only for open_* sync options

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Use of O_DIRECT only for open_* sync options
Date: 2011-03-11 11:47:21
Message-ID: 201103111147.p2BBlLN29891@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> Bruce Momjian wrote:
> > xlogdefs.h says:
> >
> > /*
> > * Because O_DIRECT bypasses the kernel buffers, and because we never
> > * read those buffers except during crash recovery, it is a win to use
> > * it in all cases where we sync on each write(). We could allow O_DIRECT
> > * with fsync(), but because skipping the kernel buffer forces writes out
> > * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
> > * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
> > * Also, O_DIRECT is never enough to force data to the drives, it merely
> > * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
> > */
> >
> > This seems wrong because fsync() can win if there are two writes before
> > the sync call. Can kernels not issue fsync() if the write was O_DIRECT?
> > If that is the cause, we should document it.
> >
>
> The comment does look busted, because you did imagine exactly a case
> where they might be combined. The only incompatibility that I'm aware
> of is that O_DIRECT requires reads and writes to be aligned properly, so
> you can't use it in random application code unless it's aware of that.
> O_DIRECT and fsync are compatible; for example, MySQL allows combining
> the two: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html

C comment updated in git head:

* Because O_DIRECT bypasses the kernel buffers, and because we never
* read those buffers except during crash recovery or if wal_level != minimal,
* it is a win to use it in all cases where we sync on each write(). We could
* allow O_DIRECT with fsync(), but it is unclear if fsync() could process
* writes not buffered in the kernel. Also, O_DIRECT is never enough to force
* data to the drives, it merely tries to bypass the kernel cache, so we still
* need O_SYNC/O_DSYNC.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-03-11 12:08:09 Re: Sync Rep v19
Previous Message Gianni Ciolli 2011-03-11 11:36:14 maximum digits for NUMERIC