Re: O_DIRECT in freebsd

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: O_DIRECT in freebsd
Date: 2003-06-22 22:15:52
Message-ID: 200306222215.h5MMFqR26835@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


What you really want is Solaris's free-behind, where it detects if a
scan is exceeding a certain percentage of the OS cache and moves the
pages to the _front_ of the to-be-reused list. I am not sure what other
OS's support this, but we need this on our own buffer manager code as
well.

Our TODO already has:

* Add free-behind capability for large sequential scans (Bruce)

Basically, I think we need free-behind rather than O_DIRECT.

---------------------------------------------------------------------------

Sean Chittenden wrote:
> > > > > The reason I mention it is that Postgres already supports
> > > > > O_DIRECT I think on some other platforms (for whatever
> > > > > reason).
> > > >
> > > > [ sounds of grepping... ] No. The only occurrence of O_DIRECT in the
> > > > source tree is in TODO:
> > > >
> > > > * Consider use of open/fcntl(O_DIRECT) to minimize OS caching
> > > >
> > > > I personally disagree with this TODO item for the same reason
> > > > Sean cited: Postgres is designed and tuned to rely on OS-level
> > > > disk caching, and bypassing that cache is far more likely to
> > > > hurt our performance than help it.
> > >
> > > DB2 and Oracle, from memory, allow users to pass hints to the
> > > planner to use/not use file system caching. This could be useful
> > > if you had an application retrieving a large amount of data on an
> > > adhoc basis. The large retrieval would empty out the disk cache
> > > there by negatively impacting upon other applications operating on
> > > data which should be cached.
> >
> > I've recently been bitten by this. On DB2, I could change what
> > bufferpool the large tables were using and set it fairly small, but
> > obviously not an option with PGSQL. But, if pgsql could stop caching
> > from occuring on user-specified queries, large table or index scans,
> > etc., it would be very helpful.
>
> Actually, now that I think about this, if the planner is going to read
> more than X number of bytes as specified in a GUC, it would be useful
> to have the fd marked as O_DIRECT to avoid polluting the disk
> cache... I have a few tables with about 300M rows (~9GB on disk) that
> I have to perform nightly seq scans over for reports and it does wipe
> out some of the other fast movers that come through and depend on the
> disk cache to be there for their speed. Because they're performed in
> the middle of the night, I don't care that much, but my avg query
> times during that period of time are slower... whether it's load or
> the disk buffer being emptied and having to be refilled, I'm not sure,
> but thinking about it, use of a GUC threshold to have an FD marked as
> O_DIRECT does make sense (0 == disabled and the default, but tunable
> in Kbytes as an admin sees fit) and could be nice for big queries that
> have lots of smaller queries running around at the same time.
>
> -sc
>
> --
> Sean Chittenden
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-06-22 22:17:36 Re: Two weeks to feature freeze
Previous Message Bruce Momjian 2003-06-22 22:05:42 Re: interval's and printing...