Re: PQexec() hangs on OOM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-07-06 17:13:46
Message-ID: 559AB74A.6070602@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 07/04/2015 03:40 PM, Michael Paquier wrote:
> On Sat, Jul 4, 2015 at 1:32 AM, Heikki Linnakangas wrote:
>> In short, the idea of returning a status code from parseInput(), instead of
>> just dealing with the error, was a bad one. Sorry I didn't have this
>> epiphany earlier...
>>
>> I came up with the attached. It fixes the few cases where we were currently
>> returning "need more data" when OOM happened, to deal with the OOM error
>> instead, by setting conn->errorMessage. How does this look to you?
>
> So this relies on the fact that when errorMessage is set subsequent
> messages are ignored, right? This looks neat.
>
> At the bottom of getAnotherTuple() in fe-protocol2.c if
> PQmakeEmptyPGresult returns NULL, shouldn't the error message be
> enforced to "out of memory"? It is an error code path, so an error
> will be set, but perhaps the message is incorrect.
>
> - if (!res->errMsg)
> - goto failure;
> + if (res)
> + {
> + res->resultStatus = isError ? PGRES_FATAL_ERROR :
> PGRES_NONFATAL_ERROR;
> + res->errMsg = pqResultStrdup(res, workBuf.data);
> + }
> If res->errMsg is NULL, we may have a crash later on when calling
> appendPQExpBufferStr on this field. I think that we should add an
> additional check on it.

Yeah, added.

I tested the various error paths this patch modifies with a debugger,
and found out that the getCopyStart changes were not doing much good.
The caller still waits for the COPY-IN result, so it still gets stuck.
So I left out that change.

The getParamDescriptions() changes were slightly broken. It didn't read
the whole input message with pqGetInt() etc., so pqParseInput3() threw
the "message contents do not agree with length in message type" error. I
started fixing that, by changing the error handling in that function to
be more like that in getRowDescriptions(), but then I realized that all
the EOF return cases in the pqParseInput3() subroutines are actually
dead code. pqParseInput3() always reads the whole message, before
passing it on to the right subroutine. That was documented for
getRowDescriptions() and getAnotherTuple(), but the rest of the
functions were more like the protocol version 2 code, prepared to deal
with incomplete messages. I think it would be good to refactor that,
removing the EOF return cases altogether. So I left out that change for
now as well.

That left me with the attached patch. It doesn't handle the
getParamDescription() case, nor the getCopyStart() case, but it's a
start. We don't necessarily need to fix everything in one go. Does this
look correct to you, as far as it goes?

- Heikki

Attachment Content-Type Size
improve-OOM-handling-in-libpq-3.patch text/x-diff 6.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message olivier.gosseaume 2015-07-06 17:29:30 Re: BUG #13484: Performance problem with logical decoding
Previous Message Tom Lane 2015-07-06 16:15:09 Re: BUG #13488: Wrong netmask calculation