Re: O_DIRECT in freebsd

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: O_DIRECT in freebsd
Date: 2003-06-23 00:42:45
Message-ID: 200306230042.h5N0gji07128@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sean Chittenden wrote:
> > Basically, we don't know when we read a buffer whether this is a
> > read-only or read/write. In fact, we could read it in, and another
> > backend could write it for us.
>
> Um, wait. The cache is shared between backends? I don't think so,
> but it shouldn't matter because there has to be a semaphore locking
> the cache to prevent the coherency issue you describe. If PostgreSQL
> didn't, it'd be having problems with this now. I'd also think that
> MVCC would handle the case of updated data in the cache as that has to
> be a common case. At what point is the cached result invalidated and
> fetched from the OS?

Uh, it's called the _shared_ buffer cache in postgresql.conf, and we
lock pages only while we are reading/writing them, not for the duration
they are in the cache.

> > The big issue is that when we do a write, we don't wait for it to
> > get to disk.
>
> Only in the case when fsync() is turned off, but again, that's up to
> the OS to manage that can of worms, which I think BSD takes care of
> that. From conf/NOTES:

Nope. When you don't have a kernel buffer cache, and you do a write,
where do you expect it to go? I assume it goes to the drive, and you
have to wait for that.

>
> # Attempt to bypass the buffer cache and put data directly into the
> # userland buffer for read operation when O_DIRECT flag is set on the
> # file. Both offset and length of the read operation must be
> # multiples of the physical media sector size.
> #
> #options DIRECTIO
>
> The offsets and length bit kinda bothers me though, but I thin that's
> stuff that the ernel has to take into account, not the userland calls,
> I wonder if that's actually accurate any more or affects userland
> calls... seems like that'd be a bit too much work to have the user
> do, esp given the lack of documentation on the flag... should be just
> drop in additional flag, afaict.
>
> > It seems to use O_DIRECT, we would have to read the buffer in a
> > special way to mark it as read-only, which seems kind of strange. I
> > see no reason we can't use free-behind in the PostgreSQL buffer
> > cache to handle most of the benefits of O_DIRECT, without the
> > read-only buffer restriction.
>
> I don't see how this'd be an issue as buffers populated via a read(),
> that are updated, and then written out, would occupy a new chunk of
> disk to satisfy MVCC. Why would we need to mark a buffer as read only
> and carry around/check its state?

We update the expired flags on the tuple during update/delete.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sean Chittenden 2003-06-23 01:12:47 Re: O_DIRECT in freebsd
Previous Message Sean Chittenden 2003-06-23 00:31:29 Re: O_DIRECT in freebsd