Re: PQexec() hangs on OOM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: hlinnaka(at)iki(dot)fi, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-07-03 16:32:36
Message-ID: 5596B924.6020307@iki.fi
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-bugs

On 05/26/2015 10:01 AM, Michael Paquier wrote:
> On Thu, May 21, 2015 at 1:31 AM, Oleksandr Shulgin wrote:
>> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>> So, attached is take three for all the other things above.
>>
>> There's still one call to pqGetErrorNotice3 that doesn't check returned
>> value for -2, in fe-connect.c. Shouldn't we also check it like this:
>>
>> [...]
>
> Yes, you are right. Take 4 attached is updated with something similar
> to what you sent to cover this code path.

This is still a few bricks shy of a load. Before this patch, if you run
out of memory when allocating a large result set - which is probably the
most common reason for OOM in libpq - you got this error:

postgres=# select generate_series(1, 10000000);
out of memory for query result

With this patch, I got:

postgres=# select generate_series(1, 10000000);
server sent data ("D" message) without prior row description ("T" message)

Looking at the patch again, I think we should actually leave
getAnotherTuple() alone. It's a lot nicer if getAnotherTuple() can
handle the OOM error itself than propagating it to the caller.

There's only one caller for getAnotherTuple(), but for
pqGetErrorNotice3() the same is even more true: it would be much nicer
if it could handle OOM itself, without propagating it to the caller. And
it well could do so. When it's processing an ERROR, it could just set
conn->errorMessage to "out of memory", and discard the error it received
from the server. When processing a NOTICE, it could likewise just send
"out of memory" to the NOTICE processsor, and discard the message from
the server. The real problem with pqGetErrorNotice3() today is that it
treats OOM the same as "no data available yet", and we can fix that by
reading but discarding the backend message. Like getAnotherTuple() does.

In short, the idea of returning a status code from parseInput(), instead
of just dealing with the error, was a bad one. Sorry I didn't have this
epiphany earlier...

I came up with the attached. It fixes the few cases where we were
currently returning "need more data" when OOM happened, to deal with the
OOM error instead, by setting conn->errorMessage. How does this look to you?

- Heikki

--
- Heikki

Attachment Content-Type Size
improve-OOM-handling-in-libpq-2.patch text/x-diff 10.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2015-07-03 18:53:44 Re: BUG #13485: JSONB To recordset not working with CamelCase
Previous Message Petr Jelinek 2015-07-03 14:59:57 Re: BUG #13126: table constraint loses its comment