Skip site navigation (1) Skip section navigation (2)

Re: [PATCH] Prefetch index pages for B-Tree index scans

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: John Lumby <johnlumby(at)hotmail(dot)com>
Cc: PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>, cedric(at)2ndquadrant(dot)com
Subject: Re: [PATCH] Prefetch index pages for B-Tree index scans
Date: 2012-11-01 18:13:30
Message-ID: CAGTBQpbu2M=-M7NUr6DWr0K8gUVmXVhwKohB-Cnj7kYS1AhH4A@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, Nov 1, 2012 at 1:37 PM, John Lumby <johnlumby(at)hotmail(dot)com> wrote:
>
> Claudio wrote :
>>
>> Oops - forgot to effectively attach the patch.
>>
>
> I've read through your patch and the earlier posts by you and Cédric.
>
> This is very interesting.      You chose to prefetch index btree (key-ptr) pages
> whereas I chose to prefetch the data pages pointed to by the key-ptr pages.
> Never mind why  --  I think they should work very well together  -  as both have
> been demonstrated to produce improvements.   I will see if I can combine them,
> git permitting  (as of course their changed file lists overlap).

Check the latest patch, it contains heap page prefetching too.

> I was surprised by this design decision :
>     /* start prefetch on next page, but not if we're reading sequentially already, as it's counterproductive in those cases */
> Is it really?    Are you assuming the it's redundant with posix_fadvise for this case?
> I think possibly when async_io is also in use by the postgresql prefetcher,
> this decision could change.

async_io indeed may make that logic obsolete, but it's not redundant
posix_fadvise what's the trouble there, but the fact that the kernel
stops doing read-ahead when a call to posix_fadvise comes. I noticed
the performance hit, and checked the kernel's code. It effectively
changes the prediction mode from sequential to fadvise, negating the
(assumed) kernel's prefetch logic.

> However I think in some environments the async-io has significant benefits over
> posix-fadvise,  especially (of course!)   where access is very non-sequential,
> but even also for sequential if there are many concurrent conflicting sets of sequential
> command streams from different backends
> (always assuming the RAID can manage them concurrently).

I've mused about the possibility to batch async_io requests, and use
the scatter/gather API insead of sending tons of requests to the
kernel. I think doing so would enable a zero-copy path that could very
possibly imply big speed improvements when memory bandwidth is the
bottleneck.


In response to

Responses

pgsql-hackers by date

Next:From: Claudio FreireDate: 2012-11-01 18:15:20
Subject: Re: [PATCH] Prefetch index pages for B-Tree index scans
Previous:From: Andres FreundDate: 2012-11-01 17:00:53
Subject: Re: [PATCH] Prefetch index pages for B-Tree index scans

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group