Skip site navigation (1) Skip section navigation (2)

Re: Prereading using posix_fadvise (was Re: Commitfest patches)

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Zeugswetter Andreas OSB SD <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, Gregory Stark <stark(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prereading using posix_fadvise (was Re: Commitfest patches)
Date: 2008-03-28 15:59:35
Message-ID: 47ED15E7.7020003@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>>> So it has nothing to do with table size. The fadvise calls need to be
>>> (and are) 
>>> limited by what can be used in the near future, and not for the whole
>>> statement.
>> Right, I was sloppy. Instead of table size, what matters is the amount 
>> of data the scan needs to access. The point remains that if the data is 
>> already in OS cache, the posix_fadvise calls are a waste of time, 
>> regardless of how many pages ahead you advise.
> 
> I now understand what posix_fadvise() is allowing us to do. 
> posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will
> need a certain block in the future --- this seems much cheaper than a
> background reader.

Yep.

> We know we will need the blocks, and telling the kernel can't hurt,
> except that there is overhead in telling the kernel.  Has anyone
> measured how much overhead?  I would be interested in a test program
> that read the same page over and over again from the kernel, with and
> without a posix_fadvise() call.

Agreed, that needs to be benchmarked next. There's also some overhead in 
doing the buffer manager hash table lookup to check whether the page is 
in shared_buffers. We could reduce that by the more complex approach 
Greg mentioned of allocating a buffer in shared_buffers when we do 
posix_fadvise.

> Should we consider only telling the kernel X pages ahead, meaning when
> we are on page 10 we tell it about page 16?

Yes. You don't want to fire off thousands of posix_fadvise calls 
upfront. That'll just flood the kernel, and it will most likely ignore 
any advise after the first few hundred or so. I'm not sure what the 
appropriate amount of read ahead would be, though. Probably depends a 
lot on the OS and hardware, and needs to be a adjustable.

In some cases we can't easily read ahead more than a certain number of 
pages. For example, in a regular index scan, we can easily fire off 
posix_advise calls for all the heap pages referenced by a single index 
page, but reading ahead more than that becomes much more complex.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-hackers by date

Next:From: Gregory StarkDate: 2008-03-28 16:00:18
Subject: Re: Prereading using posix_fadvise
Previous:From: Martijn van OosterhoutDate: 2008-03-28 15:58:24
Subject: Re: Prereading using posix_fadvise (was Re: Commitfest patches)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group