Re: PQexec() hangs on OOM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-12-14 17:11:43
Message-ID: 566EF84F.1030206@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 16/10/15 05:00, Michael Paquier wrote:
> On Thu, Oct 15, 2015 at 11:28 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Sun, Oct 11, 2015 at 6:31 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
>> wrote:
>>>
>>> Yeah, this behavior is caused by this piece of code:
>>> @@ -600,7 +601,9 @@ getRowDescriptions(PGconn *conn, int msgLength)
>>> advance_and_error:
>>> /* Discard unsaved result, if any */
>>> if (result && result != conn->result)
>>> PQclear(result);
>>> An idea may be to check for conn->result->resultStatus !=
>>> PGRES_FATAL_ERROR to concatenate correctly the error messages using
>>> pqCatenateResultError. But just returning the first error encountered
>>> as you mention would be more natural. So I updated the patch this way.

I pushed the ParamDescription fix. I'm not quite sure on the details of
the COPY patch. Do you remember what this change in PQexecFinish is for:

> @@ -1991,6 +1995,9 @@ PQexecFinish(PGconn *conn)
> * We have to stop if we see copy in/out/both, however. We will resume
> * parsing after application performs the data transfer.
> *
> + * Stop if we are in copy mode and error has occurred, the pending results
> + * will be discarded during next execution in PQexecStart.
> + *
> * Also stop if the connection is lost (else we'll loop infinitely).
> */
> lastResult = NULL;
> @@ -2020,6 +2027,11 @@ PQexecFinish(PGconn *conn)
> result->resultStatus == PGRES_COPY_BOTH ||
> conn->status == CONNECTION_BAD)
> break;
> + else if ((conn->asyncStatus == PGASYNC_COPY_IN ||
> + conn->asyncStatus == PGASYNC_COPY_OUT ||
> + conn->asyncStatus == PGASYNC_COPY_BOTH) &&
> + result->resultStatus == PGRES_FATAL_ERROR)
> + break;
> }
>
> return lastResult;

I don't immediately see why that's needed. I ran the little tester
program I wrote earlier
(http://www.postgresql.org/message-id/55FC501A.5000409@iki.fi), with and
without that change, and it behaved the same.

Also, when I modify my tester program so that when malloc fails, all
subsequent mallocs fail too, I'm still getting the "another command is
already in progress" error. The problem scenario is:

1. Application calls PQexec("COPY foo FROM STDIN")
2. getCopyStart sets conn->asyncStatus = PGASYNC_COPY_IN.
3. PQgetResult calls getCopyResult() but it fails with OOM.

Now, libpq is already in PGASYNC_COPY_IN mode, but PQexec() returned
NULL because of the OOM, so the application doesn't know that it's in
copy mode. When the application calls PQexec() again, it fails with the
"another command is already in progress" error.

Let's see if we can fix that too, or if we should put that off to yet
another patch.

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message ryan 2015-12-14 22:47:29 BUG #13818: PostgreSQL crashes after cronjob runs as "postgres"
Previous Message sienkomarcin 2015-12-14 11:45:27 BUG #13817: Query planner strange choose while select/count small part of big table - complete