Re: O_DIRECT in freebsd

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: O_DIRECT in freebsd
Date: 2003-06-22 23:50:48
Message-ID: 200306222350.h5MNomr03736@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Basically, we don't know when we read a buffer whether this is a
read-only or read/write. In fact, we could read it in, and another
backend could write it for us.

The big issue is that when we do a write, we don't wait for it to get to
disk.

It seems to use O_DIRECT, we would have to read the buffer in a special
way to mark it as read-only, which seems kind of strange. I see no
reason we can't use free-behind in the PostgreSQL buffer cache to handle
most of the benefits of O_DIRECT, without the read-only buffer restriction.

---------------------------------------------------------------------------

Sean Chittenden wrote:
> > _That_ is an excellent point. However, do we know at the time we
> > open the file descriptor if we will be doing this?
>
> Doesn't matter, it's an option to fcntl().
>
> > What about cache coherency problems with other backends not opening
> > with O_DIRECT?
>
> That's a problem for the kernel VM, if you mean cache coherency in the
> VM. If you mean inside of the backend, that could be a stickier
> issue, I think. I don't know enough of the internals yet to know if
> this is a problem or not, but you're right, it's certainly something
> to consider. Is the cache a write behind cache or is it a read
> through cache? If it's a read through cache, which I think it is,
> then the backend would have to dirty all cache entries pertaining to
> the relations being opened with O_DIRECT. The use case for that
> being:
>
> 1) a transaction begins
> 2) a few rows out of the huge table are read
> 3) a huge query is performed that triggers the use of O_DIRECT
> 4) the rows selected in step 2 are updated (this step should poison or
> update the cache, actually, and act as a write through cache if the
> data is in the cache)
> 5) the same few rows are read in again
> 6) transaction is committed
>
> Provided the cache is poisoned or updated in step 4, I can't see how
> or where this would be a problem. Please enlighten if there's a
> different case that would need to be taken into account. I can't
> imagine ever wanting to write out data using O_DIRECT and think that
> it's a read only optimization in an attempt to minimize the turn over
> in the OS's cache. From fcntl(2):
>
> O_DIRECT Minimize or eliminate the cache effects of reading and writ-
> ing. The system will attempt to avoid caching the data you
> read or write. If it cannot avoid caching the data, it will
> minimize the impact the data has on the cache. Use of this
> flag can drastically reduce performance if not used with
> care.
>
>
> > And finally, how do we deal with the fact that writes to O_DIRECT
> > files will wait until the data hits the disk because there is no
> > kernel buffer cache?
>
> Well, two things.
>
> 1) O_DIRECT should never be used on writes... I can't think of a case
> where you'd want it off, even when COPY'ing data and restoring a
> DB, it just doesn't make sense to use it. The write buffer is
> emptied as soon as the pages hit the disk unless something is
> reading those bits, but I'd imagine the write buffer would be used
> to make sure that as much writing is done to the platter in a
> single write by the OS as possible, circumventing that would be
> insane (though useful possibly for embedded devices with low RAM,
> solid state drives, or some super nice EMC fiber channel storage
> device that basically has its own huge disk cache).
>
> 2) Last I checked PostgreSQL wasn't a threaded app and doesn't use
> non-blocking IO. The backend would block until the call returns,
> where's the problem? :)
>
> If anything O_DIRECT would shake out any bugs in PostgreSQL's caching
> code, if there are any. -sc
>
> --
> Sean Chittenden
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sean Chittenden 2003-06-23 00:31:29 Re: O_DIRECT in freebsd
Previous Message The Hermit Hacker 2003-06-22 23:22:50 Re: Two weeks to feature freeze