Quick Links

Re: [PATCH] Prefetch index pages for B-Tree index scans

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	John Lumby <johnlumby(at)hotmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>, cedric(at)2ndquadrant(dot)com
Subject:	Re: [PATCH] Prefetch index pages for B-Tree index scans
Date:	2012-11-02 01:59:29
Message-ID:	50932901.8030109@2ndQuadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/1/12 6:13 PM, Claudio Freire wrote:

> posix_fadvise what's the trouble there, but the fact that the kernel
> stops doing read-ahead when a call to posix_fadvise comes. I noticed
> the performance hit, and checked the kernel's code. It effectively
> changes the prediction mode from sequential to fadvise, negating the
> (assumed) kernel's prefetch logic.

That's really interesting. There was a patch submitted at one point to
use POSIX_FADV_SEQUENTIAL on sequential scans, and that wasn't a
repeatable improvement either, so it was canned at
http://archives.postgresql.org/pgsql-hackers/2008-10/msg01611.php

The Linux posix_fadvise implementation never seemed like it was well
liked by the kernel developers. Quirky stuff like this popped up all
the time during that period, when effective_io_concurrency was being
added. I wonder how far back the fadvise/read-ahead conflict goes back.

> I've mused about the possibility to batch async_io requests, and use
> the scatter/gather API instead of sending tons of requests to the
> kernel. I think doing so would enable a zero-copy path that could very
> possibly imply big speed improvements when memory bandwidth is the
> bottleneck.

Another possibly useful bit of history here for you. Greg Stark wrote a
test program that used async I/O effectively on both Linux and Solaris.
Unfortunately, it was hard to get that to work given how Postgres does
its buffer I/O, and using processes instead of threads. This looks like
the place he commented on why:

http://postgresql.1045698.n5.nabble.com/Multi-CPU-Queries-Feedback-and-or-suggestions-wanted-td1993361i20.html

The part I think was relevant there from him:

"In the libaio view of the world you initiate io and either get a
callback or call another syscall to test if it's complete. Either
approach has problems for Postgres. If the process that initiated io
is in the middle of a long query it might take a long time, or not even
never get back to complete the io. The callbacks use threads...

And polling for completion has the problem that another process could
be waiting on the io and can't issue a read as long as the first
process has the buffer locked and io in progress. I think aio makes a
lot more sense if you're using threads so you can start a thread to
wait for the io to complete."

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Re: [PATCH] Prefetch index pages for B-Tree index scans at 2012-11-01 18:13:30 from Claudio Freire

Responses

Re: [PATCH] Prefetch index pages for B-Tree index scans at 2012-11-02 05:05:02 from Claudio Freire

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Claudio Freire	2012-11-02 05:05:02	Re: [PATCH] Prefetch index pages for B-Tree index scans
Previous Message	Greg Smith	2012-11-02 01:19:51	Re: Proposal for Allow postgresql.conf values to be changed via SQL