Re: Prereading using posix_fadvise

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Prereading using posix_fadvise
Date: 2008-03-28 17:24:15
Message-ID: 87ve36yd6o.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Someone wrote:
>>>
>>> Should we consider only telling the kernel X pages ahead, meaning when
>>> we are on page 10 we tell it about page 16?

The patch I posted specifically handles bitmap heap scans. It does in fact
prefetch only a limited number of pages from the bitmap stream based on a guc,
but it tries to be a bit clever about ramping up gradually.

The real danger here, imho, is doing read-ahead for blocks the client never
ends up reading. By ramping up the read-ahead gradually as the client reads
records we protect against that.

> Heikki Linnakangas wrote:
>>
>> Yes. You don't want to fire off thousands of posix_fadvise calls
>> upfront. That'll just flood the kernel, and it will most likely ignore
>> any advise after the first few hundred or so. I'm not sure what the
>> appropriate amount of read ahead would be, though. Probably depends a
>> lot on the OS and hardware, and needs to be a adjustable.

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>
> And if you read-ahead too far the pages might get pushed out of the
> kernel cache before you ask to read them.

While these concerns aren't entirely baseless the actual experiments seem to
show the point of diminishing returns is pretty far out there. Look at the
graphs below, keeping in mind that the X axis is the number of blocks
prefetched.

http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php

The pink case is analogous to a bitmap index scan where the blocks are read in
order. In that case the point of diminishing returns is reached around 64
pages. But performance doesn't actually dip until around 512 pages. And even
prefetching 8,192 blocks the negative impact on performance is still much less
severe than using a smaller-than-optimal prefetch size.

This is on a piddly little 3-way raid. On a larger raid you would want even
larger prefetch sizes.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-03-28 17:34:30 Re: Commitfest patches
Previous Message Bruce Momjian 2008-03-28 17:23:09 Re: [PATCHES] Implemented current_query