Re: O_DIRECT in freebsd

From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: O_DIRECT in freebsd
Date: 2003-06-23 04:01:35
Message-ID: 20030623040135.GO97131@perrin.int.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> >> it doesn't seem totally out of the question. I'd kinda like to
> >> see some experimental evidence that it's worth doing though.
> >> Anyone care to make a quick-hack prototype and do some
> >> measurements?
>
> > What would you like to measure? Overall system performance when a
> > query is using O_DIRECT or are you looking for negative/postitve
> > impact of read() not using the FS cache? The latter is much
> > easier to do than the former... recreating a valid load
> > environment that'd let any O_DIRECT benchmark be useful isn't
> > trivial.
>
> If this stuff were easy, we'd have done it already ;-).

What do you mean? Bits don't just hit the tree randomly because of a
possible speed improvement hinted at by a man page reference? :-]

> The first problem is to figure out what makes sense to measure.

Egh, yeah, and this isn't trivial either.... benchmarking around vfs
caching makes it hard to get good results (been down that prim rose
path before with sendfile() happiness).

> Given that the request is for a quick-and-dirty test, I'd be willing
> to cut you some slack on the measurement process. That is, it's
> okay to pick something easier to measure over something harder to
> measure, as long as you can make a fair argument that what you're
> measuring is of any interest at all...

hrm, well, given the easy part is thumping out the code, how's the
following sound as a test procedure:

1) Write out several files at varying sizes using O_DIRECT (512KB,
1MB, 5MB, 10MB, 50MB, 100MB, 512MB, 1GB) to avoid having the FS
cache polluted by the writes.

2) Open two new procs that read the above created files with and
without O_DIRECT (each test iteration must rewrite the files
above).

3) Before each read() call (does PostgreSQL use fread(3) or read(2)?),
use gettimeofday(2) to get high resolution timing of time required
to perform each system call.

4) Perform each of the tests above 4 times, averaging the last three
and throwing out the 1st case (though reporting its value may be of
interest).

I'm not that wild about writing anything threaded unless there's
strong enough interest in a write() to an O_DIRECT'ed fd to see what
happens. I'm not convinced we'll see anything worth while unless I
setup an example that's doing a ton of write disk io.

As things stand, because O_DIRECT is an execution fast path through
the vfs subsystem, I expect the speed difference to be greater on
faster HDDs with high RPMs than on slower IDE machines at only
5400RPM... thus trivializing any benchmark I'll do on my laptop. And
actually, if the app can't keep up with the disk, I bet the fs cache
case will be faster. If the read()'s are able to keep up at the rate
of the HDD, however, this could be a big win in the speed dept, but if
things lag for an instant, the platter will have to make another
rotation before the call comes back to the userland.

Now that I think about it, the optimal case would be to anonymously
mmap() a private buffer that does the read() writes into that way the
HDD could just DMA the data into the mmap()'ed buffer making it a
zero-copy read operation.... though stirring any interest with my
mmap() benchmarks from a while back seems to me have been lost in the
fray. :)

-sc

--
Sean Chittenden

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-06-23 04:02:33 Re: Two weeks to feature freeze
Previous Message Bruce Momjian 2003-06-23 04:00:25 Re: [HACKERS] large objects