Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: John Lumby <johnlumby(at)hotmail(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-05-30 07:15:44
Message-ID: 53883020.2020606@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 05/30/2014 12:53 AM, John Lumby wrote:
>> Date: Thu, 29 May 2014 18:00:28 -0300
>> Subject: Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch
>> From: klaussfreire(at)gmail(dot)com
>> To: hlinnakangas(at)vmware(dot)com
>> CC: johnlumby(at)hotmail(dot)com; pgsql-hackers(at)postgresql(dot)org
>>
>> On Thu, May 29, 2014 at 5:39 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>> On 05/29/2014 11:34 PM, Claudio Freire wrote:
>>>>
>>>> On Thu, May 29, 2014 at 5:23 PM, Heikki Linnakangas
>>>> <hlinnakangas(at)vmware(dot)com> wrote:
>>>>>
>>>>> On 05/29/2014 04:12 PM, John Lumby wrote:
>>>>>>
>>>>>>
>>>>>>> On 05/28/2014 11:52 PM, John Lumby wrote:
>>>>>>>
>>>>>>> The patch seems to assume that you can put the aiocb struct in shared
>>>>>>> memory, initiate an asynchronous I/O request from one process, and wait
>>>>>>> for its completion from another process. I'm pretty surprised if that
>>>>>>> works on any platform.
>>>>>>
>>>>>>
>>>>>> It works on linux. Actually this ability allows the asyncio
>>>>>> implementation to
>>>>>> reduce complexity in one respect (yes I know it looks complex enough) :
>>>>>> it makes waiting for completion of an in-progress IO simpler than for
>>>>>> the existing synchronous IO case,. since librt takes care of the
>>>>>> waiting.
>>>>>> specifically, no need for extra wait-for-io control blocks
>>>>>> such as in bufmgr's WaitIO()
>>>>>
>>>>>
>>>>> [checks]. No, it doesn't work. See attached test program.
>
> Thanks for checking and thanks for coming up with that test program.
> However, yes, it really does work -- always (on linux).
> Your test program is doing things in the wrong order -
> it calls aio_suspend *before* aio_error.
> However, the rule is, call aio_suspend *after* aio_error
> and *only* if aio_error returns EINPROGRESS.

I see no such a rule in the man pages of any of the functions involved.
And it wouldn't matter anyway; the behavior is exactly the same if you
aio_error() first.

> See the code changes to fd.c function FileCompleteaio()
> to see how we have done it. And I am attaching corrected version
> of your test program which runs just fine.

As Claudio mentioned earlier, the way FileCompleteaio() uses aio_suspend
is just a complicated way of polling. You might as well replace the
aio_suspend() calls with pg_usleep().

>>>>> It kinda seems to work sometimes, because of the way it's implemented in
>>>>> glibc. The aiocb struct has a field for the result value and errno, and
>>>>> when
>>>>> the I/O is finished, the worker thread fills them in. aio_error() and
>>>>> aio_return() just return the values of those fields, so calling
>>>>> aio_error()
>>>>> or aio_return() do in fact happen to work from a different process.
>>>>> aio_suspend(), however, is implemented by sleeping on a process-local
>>>>> mutex,
>>>>> which does not work from a different process.
>>>>>
>>>>> Even if it worked on Linux today, it would be a bad idea to rely on it
>>>>> from
>>>>> a portability point of view. No, the only sane way to make this work is
>>>>> that
>>>>> the process that initiates an I/O request is responsible for completing
>>>>> it.
>>>>> If another process needs to wait for an async I/O to complete, we must
>>>>> use
>>>>> some other means to do the waiting. Like the io_in_progress_lock that we
>>>>> already have, for the same purpose.
>>>>
>>>> But calls to it are timeouted by 10us, effectively turning the thing
>>>> into polling mode.
>>>
>>> We don't want polling... And even if we did, calling aio_suspend() in a way
>>> that's known to be broken, in a loop, is a pretty crappy way of polling.
>
> Well, as mentioned earlier, it is not broken. Whether it is efficient I am not sure.
> I have looked at the mutex in aio_suspend that you mentioned and I am not
> quite convinced that, if caller is not the original aio_read process,
> it renders the suspend() into an instant timeout. I will see if I can verify that.

I don't see the point of pursuing this design further. Surely we don't
want to use polling here, and you're relying on undefined behavior
anyway. I'm pretty sure aio_return/aio_error won't work from a different
process on all platforms, even if it happens to work on Linux. Even on
Linux, it might stop working if the underlying implementation changes
from the glibc pthread emulation to something kernel-based.

- Heikki

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Samonenko 2014-05-30 07:44:07 Re: Fwd: libpq: indefinite block on poll during network problems
Previous Message Bhushan Pathak 2014-05-30 06:49:20 Re: Postgresql 9.2.4 - timezone error

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-05-30 07:25:35 Re: Fwd: Typo fixes in Solution.pm, part of MSVC scripts
Previous Message Amit Kapila 2014-05-30 05:00:42 Re: pg_sleep() doesn't work well with recovery conflict interrupts.