Re: PQexec() hangs on OOM

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-09-05 12:45:24
Message-ID: CAA4eK1Kt=vBtb-0jOS=x7xb0zvTYPa91D0yuGA=TtQQUH3ESew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 4, 2015 at 12:55 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
> On Fri, Sep 4, 2015 at 2:07 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> It was actually a far cleaner approach to have a failure flag to decide
>>> if based on the async state process should move on to a failure code path
>>> or not.
>>>
>>
>> I think you have already tried, but it seems to me that if we can handle
>> it based on result status, that would be better, rather than introducing
>> new state, but anyway lets first try to sort out above reported problem.
>>
>
> So, looking at that again with a fresh eye, I noticed that
> getCopyDataMessage has a similar error handling when it fails on
> pqCheckInBufferSpace. So looking at what I added, it seems that I am trying
> to duplicate the protocol error handling even if everything is already
> present, so that's really overdoing it. Hence, let's simply drop this idea
> of new flag PGASYNC_FATAL and instead treat the OOM as a sync handling
> error with the server, like in the patch attached.
>
>
I think sync handling error, drops the connection which I feel should
be only done as a last resort for any error and if we are able to handle
OOM error for other kind of messages, then we should try to handle
it for Copy related messages as well. I have tried to do it for Copy
In and Copy Out protocol messages in the attached patch with logic
similar to what is currently used in code. The idea is that in Copy
mode if there is error we just return the error and the pending data
will be automatically discarded in next execution during PQexecStart().

> When emulating an OOM with this patch, I am getting this error instead of
> the infinite loop, which looks acceptable to me:
> =# copy aa to stdout;
> out of memory
> lost synchronization with server: got message type "H", length 5
> The connection to the server was lost. Attempting reset: Succeeded.
>
> The extra message handling I have added at the end of getCopyStart and
> getParamDescriptions still looks more adapted to me when a failure happens,
> so I kept it.
>

Sure, but then make the handling similar to getRowDescriptions() both for
failure and success case, that way code will be consistent.

Still I have not checked about COPY_BOTH and getParamDescriptions(),
see if you like what I have done in attached patch, then we can do the
similar for them as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
20150905_libpq_oom_v3.patch application/octet-stream 9.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2015-09-07 08:10:10 Re: PQexec() hangs on OOM
Previous Message Thomas Munro 2015-09-05 12:43:15 Re: GRANT USAGE ON SEQUENCE missing from psql command completion