Skip site navigation (1) Skip section navigation (2)

Direct I/O issues

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Direct I/O issues
Date: 2006-11-23 06:30:24
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-patchespgsql-performance
I've been trying to optimize a Linux system where benchmarking suggests 
large performance differences between the various wal_sync_method options 
(with o_sync being the big winner).  I started that by using 
src/tools/fsync/test_fsync to get an idea what I was dealing with (and to 
spot which drives had write caching turned on).  Since those results 
didn't match what I was seeing in the benchmarks, I've been browsing the 
backend source to figure out why.  I noticed test_fsync appears to be, 
ahem, out of sync with what the engine is doing.

It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at 
compile time by a series of preprocessor tests in 
src/backend/access/transam/xlog.c When O_DIRECT is available, 
O_SYNC/O_FSYNC/O_DSYNC writes use it.  test_fsync doesn't do that.

I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into 
test_fsync; all the flags had the same name so it dropped right in.  You 
can get the version I made at 
(fixed a compiler warning, too)

The results I get now look fishy.  I'm not sure if I screwed up a step, or 
if I'm seeing a real problem.  The system here is running RedHat Linux, 
RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard 
7200RPM IDE drive.  I turned off write caching with hdparm -W 0

Here's an excerpt from the stock test_fsync:

Compare one o_sync write to two:
         one 16k o_sync write     8.717944
         two 8k o_sync writes    17.501980

Compare file sync methods with 2 8k writes:
         (o_dsync unavailable)
         open o_sync, write      17.018495
         write, fdatasync         8.842473
         write, fsync,            8.809117

And here's the version I tried to modify to include O_DIRECT support:

Compare one o_sync write to two:
         one 16k o_sync write     0.004995
         two 8k o_sync writes     0.003027

Compare file sync methods with 2 8k writes:
         (o_dsync unavailable)
         open o_sync, write       0.004978
         write, fdatasync         8.845498
         write, fsync,            8.834037

Obivously the o_sync writes aren't waiting for the disk.  Is this a 
problem with O_DIRECT under Linux?  Or is my code just not correctly 
testing this behavior?

Just as a sanity check, I did try this on another system, running SuSE 
with drives connected to a cciss SCSI device, and I got exactly the same 
results.  I'm concerned that Linux users who use O_SYNC because they 
notice it's faster will be losing their WAL integrity without being aware 
of the problem, especially as the whole O_DIRECT business isn't even 
mentioned in the WAL documentation--it really deserves to be brought up in 
the wal_sync_method notes at

And while I'm mentioning improvements to that particular documentation 
page...the wal_buffers notes there are so sparse they misled me initially. 
They suggest only bumping it up for situations with very large 
transactions; since I was testing with small ones I left it woefully 
undersized initially.  I would suggest copying the text from to 
here: "When full_page_writes is set and the system is very busy, setting 
this value higher will help smooth response times during the period 
immediately following each checkpoint."  That seems to match what I found 
in testing.

* Greg Smith gsmith(at)gregsmith(dot)com Baltimore, MD


pgsql-performance by date

Next:From: Greg SmithDate: 2006-11-23 07:31:22
Subject: Re: Lying drives [Was: Re: Which OS provides the _fastest_
Previous:From: Luke LonerganDate: 2006-11-22 21:47:56
Subject: Re: availability of SATA vendors

pgsql-hackers by date

Next:From: Joshua D. DrakeDate: 2006-11-23 07:42:16
Subject: Re: 8.2 open items list
Previous:From: Tom LaneDate: 2006-11-23 06:24:27
Subject: Re: 8.2 open items list

pgsql-patches by date

Next:From: Alban HertroysDate: 2006-11-23 08:44:14
Subject: Re: ISO week dates
Previous:From: Bruce MomjianDate: 2006-11-23 05:16:14
Subject: Re: large object regression tests, take two

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group