Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-06-23 21:43:06
Message-ID: BAY175-W364D73BCD54C4E4351AA63A31F0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

----------------------------------------
> Date: Thu, 19 Jun 2014 15:43:44 -0300
> Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
> From: klaussfreire(at)gmail(dot)com
> To: stark(at)mit(dot)edu
> CC: hlinnakangas(at)vmware(dot)com; johnlumby(at)hotmail(dot)com; pgsql-hackers(at)postgresql(dot)org
>
> On Thu, Jun 19, 2014 at 2:49 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> I don't think the threaded implementation on Linux is the one to use
>> though.  [... ] The overhead of thread communication
>> will completely outweigh any advantage over posix_fadvise's partial
>> win.
>
> What overhead?
>
> The only communication is through a "done" bit and associated
> synchronization structure when *and only when* you want to wait on it.
>

Threads do cost some extra CPU,  but provided the system had some
spare CPU capacity,  then performance improves because of reduced IO wait.
I quoted a measured improvement of  17% - 18% improvement in the README
along with some more explanation of when the asyc IO gives and improvement.

> Furthermore, posix_fadvise is braindead on this use case, been there,
> done that. What you win with threads is a better postgres-kernel
> interaction, even if you loose some CPU performance it's gonna beat
> posix_fadvise by a large margin.
>
> [...]
>
>> When I investigated this I found the buffer manager's I/O bits seemed
>> to already be able to represent the state we needed (i/o initiated on
>> this buffer but not completed). The problem was in ensuring that a
>> backend would process the i/o completion promptly when it might be in
>> the midst of handling other tasks and might even get an elog() stack
>> unwinding. The interface that actually fits Postgres best might be the
>> threaded interface (orthogonal to the threaded implementation
>> question) which is you give aio a callback which gets called on a
>> separate thread when the i/o completes. The alternative is you give
>> aio a list of operation control blocks and it tells you the state of
>> all the i/o operations. But it's not clear to me how you arrange to do
>> that regularly, promptly, and reliably.
>
> Indeed we've been musing about using librt's support of completion
> callbacks for that.

For the most common case of a backend issues a PrefetchBuffer
and then that *same* backend issues ReadBuffer,  the posix aio works
ideally,  since there is no need for any callback or completion signal,
we simply check "is it complete" during the ReadBuffer.

It is when some *other* backend gets there first with the ReadBuffer that
things are a bit trickier.    The current version of the patch did polling for that case
but that drew criticism,    and so an imminent new version of the patch
uses the sigevent mechanism.    And there are other ways still.

In an earlier posting I reported that ,  in my benchmark,
99.8% of [FileCompleteaio] calls are from originator and only < 0.2% are not.so,  from a performance perspective,  only the common case really matters.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Stark 2014-06-23 23:04:50 Re: Extended Prefetching using Asynchronous IO - proposal and patch
Previous Message Jim Nasby 2014-06-23 21:05:23 Re: Extended Prefetching using Asynchronous IO - proposal and patch

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-06-23 22:00:51 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Tomas Vondra 2014-06-23 21:30:36 Re: Sending out a request for more buildfarm animals?