Re: We really ought to do something about O_DIRECT and data=journalled on ext4

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: We really ought to do something about O_DIRECT and data=journalled on ext4
Date: 2010-12-01 22:48:05
Message-ID: 4CF6D0A5.8080501@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I think the best answer is to get out of the business of using
> O_DIRECT by default, especially seeing that available evidence
> suggests it might not be a performance win anyway.
>

I was concerned that open_datasync might be doing a better job of
forcing data out of drive write caches. But the tests I've done on
RHEL6 so far suggest that's not true; the write guarantees seem to be
the same as when using fdatasync. And there's certainly one performance
regression possible going from fdatasync to open_datasync, the case
where you're overflowing wal_buffers before you actually commit.

Below is a test of the troublesome behavior on the same RHEL6 system I
gave test_fsync performance test results from at
http://archives.postgresql.org/message-id/4CE2EBF8.4040602@2ndquadrant.com

This confirms that the kernel now defining O_DSYNC behavior as being
available, but not actually supporting it when running the filesystem in
journaled mode, is the problem here. That's clearly a kernel bug and no
fault of PostgreSQL, it's just never been exposed in a default
configuration before. The RedHat bugzilla report seems a bit unclear
about what's going on here, may be worth updating that to note the
underlying cause.

Regardless, I'm now leaning heavily toward the idea of avoiding
open_datasync by default given this bug, and backpatching that change to
at least 8.4. I'll do some more database-level performance tests here
just as a final sanity check on that. My gut feel is now that we'll
eventually be taking something like Marti's patch, adding some more
documentation around it, and applying that to HEAD as well as some
number of back branches.

$ mount | head -n 1
/dev/sda7 on / type ext4 (rw)
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync # the default is the first option
$ pg_ctl start
server starting
LOG: database system was shut down at 2010-12-01 17:20:16 EST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
$ psql -c "show wal_sync_method"
wal_sync_method
-----------------
open_datasync

[Edit /etc/fstab, change mount options to be "data=journal" and reboot]

$ mount | grep journal
/dev/sda7 on / type ext4 (rw,data=journal)
$ cat postgresql.conf | grep wal_sync_method
#wal_sync_method = fdatasync # the default is the first option
$ pg_ctl start
server starting
LOG: database system was shut down at 2010-12-01 12:14:50 EST
PANIC: could not open file "pg_xlog/000000010000000000000001" (log file
0, segment 1): Invalid argument
LOG: startup process (PID 2690) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
$ pg_ctl stop

$ vi $PGDATA/postgresql.conf
$ cat $PGDATA/postgresql.conf | grep wal_sync_method
wal_sync_method = fdatasync # the default is the first option
$ pg_ctl start
server starting
LOG: database system was shut down at 2010-12-01 12:14:40 EST
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2010-12-01 23:41:33 Re: crash-safe visibility map, take three
Previous Message Jeff Davis 2010-12-01 22:24:22 Re: crash-safe visibility map, take three