Re: We really ought to do something about O_DIRECT and data=journalled on ext4

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: We really ought to do something about O_DIRECT and data=journalled on ext4
Date: 2010-12-03 19:55:23
Message-ID: 4CF94B2B.7060001@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

So, I've been doing some reading about this issue, and I think
regardless of what other changes we make we should never enable O_DIRECT
automatically on Linux, and it was a mistake for us to do so in the
first place.

First, in the Linux docs for open():

=========

In summary, O_DIRECT is a potentially powerful tool that should be used
with caution. It is recommended that applications treat use of O_DIRECT
as a performance option which is disabled by default.

=========

Second, Linus has a quote about O_DIRECT that I think should serve as an
indicator to us that directIO will never be beneficial-by-default on
Linux, and might even someday be desupported:

============

The right way to do it is to just not use O_DIRECT.

The whole notion of "direct IO" is totally braindamaged. Just say no.

This is your brain: O
This is your brain on O_DIRECT: .

Any questions?

I should have fought back harder. There really is no valid reason for EVER
using O_DIRECT. You need a buffer whatever IO you do, and it might as well
be the page cache. There are better ways to control the page cache than
play games and think that a page cache isn't necessary.

So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead.

Linus
=============

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message r t 2010-12-03 19:56:15 Re: Patch to add a primary key using an existing index
Previous Message Heikki Linnakangas 2010-12-03 19:45:59 Re: Patch to add a primary key using an existing index