Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: John Lumby <johnlumby(at)hotmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-05-29 21:53:51
Message-ID: BAY175-W3241B07A89EC187C3D826A3240@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> Date: Thu, 29 May 2014 18:00:28 -0300
> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
> From: klaussfreire(at)gmail(dot)com
> To: hlinnakangas(at)vmware(dot)com
> CC: johnlumby(at)hotmail(dot)com; pgsql-hackers(at)postgresql(dot)org
>
> On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
> > On 05/29/2014 11:34 PM, Claudio Freire wrote:
> >>
> >> On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
> >> <hlinnakangas(at)vmware(dot)com> wrote:
> >>>
> >>> On 05/29/2014 04:12 PM, John Lumby wrote:
> >>>>
> >>>>
> >>>>> On 05/28/2014 11:52 PM, John Lumby wrote:
> >>>>>
> >>>>> The patch seems to assume that you can put the aiocb struct in shared
> >>>>> memory, initiate an asynchronous I/O request from one process, and wait
> >>>>> for its completion from another process. I'm pretty surprised if that
> >>>>> works on any platform.
> >>>>
> >>>>
> >>>> It works on linux. Actually this ability allows the asyncio
> >>>> implementation to
> >>>> reduce complexity in one respect (yes I know it looks complex enough) :
> >>>> it makes waiting for completion of an in-progress IO simpler than for
> >>>> the existing synchronous IO case,. since librt takes care of the
> >>>> waiting.
> >>>> specifically, no need for extra wait-for-io control blocks
> >>>> such as in bufmgr's WaitIO()
> >>>
> >>>
> >>> [checks]. No, it doesn't work. See attached test program.

Thanks for checking and thanks for coming up with that test program.
However, yes, it really does work -- always (on linux).
Your test program is doing things in the wrong order -
it calls aio_suspend *before* aio_error.
However, the rule is, call aio_suspend *after* aio_error
and *only* if aio_error returns EINPROGRESS.

See the code changes to fd.c function FileCompleteaio()
to see how we have done it. And I am attaching corrected version
of your test program which runs just fine.

> >>>
> >>> It kinda seems to work sometimes, because of the way it's implemented in
> >>> glibc. The aiocb struct has a field for the result value and errno, and
> >>> when
> >>> the I/O is finished, the worker thread fills them in. aio_error() and
> >>> aio_return() just return the values of those fields, so calling
> >>> aio_error()
> >>> or aio_return() do in fact happen to work from a different process.
> >>> aio_suspend(), however, is implemented by sleeping on a process-local
> >>> mutex,
> >>> which does not work from a different process.
> >>>
> >>> Even if it worked on Linux today, it would be a bad idea to rely on it
> >>> from
> >>> a portability point of view. No, the only sane way to make this work is
> >>> that
> >>> the process that initiates an I/O request is responsible for completing
> >>> it.
> >>> If another process needs to wait for an async I/O to complete, we must
> >>> use
> >>> some other means to do the waiting. Like the io_in_progress_lock that we
> >>> already have, for the same purpose.
> >>
> >>
> >> But calls to it are timeouted by 10us, effectively turning the thing
> >> into polling mode.
> >
> >
> > We don't want polling... And even if we did, calling aio_suspend() in a way
> > that's known to be broken, in a loop, is a pretty crappy way of polling.

Well, as mentioned earlier, it is not broken. Whether it is efficient I am not sure.
I have looked at the mutex in aio_suspend that you mentioned and I am not
quite convinced that, if caller is not the original aio_read process,
it renders the suspend() into an instant timeout. I will see if I can verify that.
Where are you (Claudio) seeing 10us?

>
>
> Didn't fix that, but the attached patch does fix regression tests when
> scanning over index types other than btree (was invoking elog when the
> index am didn't have ampeeknexttuple)

Attachment Content-Type Size
aio-shmem-test.c text/plain 2.6 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-05-29 21:56:57 Re: Extended Prefetching using Asynchronous IO - proposal and patch
Previous Message Claudio Freire 2014-05-29 21:49:07 Re: Extended Prefetching using Asynchronous IO - proposal and patch

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-29 21:56:57 Re: Extended Prefetching using Asynchronous IO - proposal and patch
Previous Message Claudio Freire 2014-05-29 21:49:07 Re: Extended Prefetching using Asynchronous IO - proposal and patch