Skip site navigation (1) Skip section navigation (2)

FW: [PATCH] Prefetch index pages for B-Tree index scans

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>, <klaussfreire(at)gmail(dot)com>
Cc: <cedric(at)2ndquadrant(dot)com>
Subject: FW: [PATCH] Prefetch index pages for B-Tree index scans
Date: 2012-11-01 19:41:16
Message-ID: COL116-W28048CDCCE2DC5D30C4340A3600@phx.gbl (view raw or flat)
Thread:
Lists: pgsql-hackers
Claudio wrote :
>
> Check the latest patch, it contains heap page prefetching too.
>

Oh yes I see. I missed that - I was looking in the wrong place.
I do have one question about the way you did it : by placing the
prefetch heap-page calls in _bt_next, which effectively means inside
a call from the index am index_getnext_tid to btgettuple, are you sure
you are synchronizing your prefetches of heap pages with the index am's
ReadBuffer's of heap pages? I.e. are you complying with this comment
from nodeBitmapHeapscan.c for prefetching its bitmap heap pages in
the bitmap-index-scan case:

* We issue prefetch requests *after* fetching the current page to try
* to avoid having prefetching interfere with the main I/O.

I can't really tell whether your design conforms to this and nor do I
know whether it is important, but I decided to do it in the same manner,
and so implemented the heap-page fetching in index_fetch_heap

>
> async_io indeed may make that logic obsolete, but it's not redundant
> posix_fadvise what's the trouble there, but the fact that the kernel
> stops doing read-ahead when a call to posix_fadvise comes. I noticed
> the performance hit, and checked the kernel's code. It effectively
> changes the prediction mode from sequential to fadvise, negating the
> (assumed) kernel's prefetch logic.
>
I did not know that. Very interesting.


>
> I've mused about the possibility to batch async_io requests, and use
> the scatter/gather API insead of sending tons of requests to the
> kernel. I think doing so would enable a zero-copy path that could very
> possibly imply big speed improvements when memory bandwidth is the
> bottleneck.

I think you are totally correct on this point. If I recall, the 
glic (librt) aio does have an lio_listio but it is either a noop
or just loops over the list, I forget which (don't have its source right now),
but in any case I am sure there is a potential for implementing such a facility.
But to be really effective, it should be implemented in the kernel itself,
which we don't have today.

John
 		 	   		  

pgsql-hackers by date

Next:From: Daniel FarinaDate: 2012-11-01 19:42:18
Subject: Re: Synchronous commit not... synchronous?
Previous:From: Claudio FreireDate: 2012-11-01 18:15:20
Subject: Re: [PATCH] Prefetch index pages for B-Tree index scans

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group