Re: adding support for posix_fadvise()

From: Neil Conway <neilc(at)samurai(dot)com>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: adding support for posix_fadvise()
Date: 2003-11-03 13:50:00
Message-ID: 1067867399.3089.219.camel@tokyo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2003-11-03 at 04:21, Hannu Krosing wrote:
> Neil Conway kirjutas E, 03.11.2003 kell 08:07:
> > (2) ISTM that we can set POSIX_FADV_RANDOM for *all* indexes, since the
> > vast majority of the accesses to them shouldn't be sequential.
>
> Perhaps we could do it for all _leaf_ nodes, the root and intermediate
> nodes are usually better kept in cache.

POSIX_FADV_RANDOM doesn't effect the page cache, it just determines how
aggressive the kernel is when doing readahead (at least on Linux, but
I'd expect to see other kernels implement similar behavior). In other
words, using FADV_RANDOM shouldn't decrease the chance that interior
B+-tree nodes are kept in the page cache.

> True. POSIX_FADV_DONTNEED should be only used if the page was retrieved
> by VACUUM.

Right -- we'd like pages touched by VACUUM to be flushed from the page
cache if that page wasn't previously in *either* the PostgreSQL buffer
pool or the kernel's page cache. We can implement the former easily
enough, but I don't see any feasible way to do the latter: on a high-end
machine with gigabytes of RAM but a relatively small shared_buffers
(which is the configuration we recommend), there may be plenty of hot
pages that aren't in the PostgreSQL buffer pool but are in the page
cache.

> also, you may want to restore old FADV* after you are done - just
> running one seqscan should probably not leave the relation in
> POSIX_FADV_SEQUENTIAL mode forever.

Right, I forgot to mention that. The API doesn't provide a means to get
the current advice for an FD. So when we're finished doing whatever
operation we set some advice for, we'll need to just reset the file to
FADV_NORMAL and hope that it doesn't overrule some advise just set by
someone else. Either that, or we can manually keep track of all the
advise we're setting ourselves, but that seems a hassle.

-Neil

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2003-11-03 14:00:34 equal() perf tweak
Previous Message Christopher Browne 2003-11-03 13:22:45 Re: Experimental patch for inter-page delay in VACUUM