Re: PQexec() hangs on OOM

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-09-18 22:17:46
Message-ID: CAB7nPqQxMZkq+kfWK4Ad82m8t6fRG670T-X13hjCR--XX7RZbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 18, 2015 at 12:55 PM, Heikki Linnakangas wrote:
> On 09/14/2015 04:36 AM, Michael Paquier wrote:
> Patches 0001 and 0002 look good to me. Two tiny nitpicks:
>
>> --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
>> +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
>> @@ -445,6 +445,7 @@ libpqrcv_PQexec(const char *query)
>> if (PQresultStatus(lastResult) == PGRES_COPY_IN ||
>> PQresultStatus(lastResult) == PGRES_COPY_OUT ||
>> PQresultStatus(lastResult) == PGRES_COPY_BOTH ||
>> + PQresultStatus(lastResult) == PGRES_FATAL_ERROR ||
>> PQstatus(streamConn) == CONNECTION_BAD)
>> break;
>> }
>
>
> Is this really needed? AFAICS, the loop will terminate on PGRES_FATAL_ERROR
> too. Without this patch, it will always return the last result from the
> query, whether or not it was an error result. With this patch, it will
> return the first error encountered, or last non-error result if there is no
> error. In practice, there is no difference because we don't expect multiple
> results from the server.

Yes, you are right. PQgetResult is doing the necessary processing
here. This seems to be some remnant of previous testing around the
replication protocol, and while it has no real incidence for the
server, it may for applications that rely on libpqwalreceiver.so...
Hence let's remove it. Thanks for pointing out that.

>> @@ -1343,31 +1348,33 @@ getNotify(PGconn *conn)
>> [...]
>> + if (result && nfields > 0)
>
> 'result' is never NULL here, so the NULL test is still not needed.

I thought I removed all of them.

> The 0003 patch also looks OK to me as far as it goes. I feel that some more
> refactoring would be in order, though. There are still many different styles
> for handling a command: sometimes it's done inline in the case-statement,
> sometimes it calls out to a separate function. Sometimes the function moves
> inStart, sometimes it does not. On error (e.g. short message), sometimes the
> function returns 'false', sometimes it sets an error message itself.

Sure. I had in mind a backpatch as the main goal of this set of
patches, and there is surely more room than simply removing all the
dead code paths that 0003 is currently doing. Why not simply fixing
the holes in the current logic to begin, and have the refactoring
patch as something to work on for the next CF? I don't mind working on
it FWIW. That's just a larger topic than what is discussed on this
thread.

> Patches 0001 and 0002 look small enough that we probably could backpatch
> them, so that would leave 0003 as a optional master-only cleanup.

That makes sense, and what I had in mind as well. Attached are updated
patches, feel free to ignore 0003, that's the same as before only
removing the dead code.

> I spent some time writing a regression test for this. This is pretty
> difficult to test otherwise, and it's a good candidate for automated,
> exchaustive testing. The only difficulty is injecting the malloc() failures
> in a cross-platform way. I ended up with a glibc-only solution that uses
> glibc's malloc() hooks. Other alternatives I considered were Linux-specific
> LD_PRELOAD, GNU ld specific -wrap option. One cross-platform solution would
> be to #define malloc() and friends to wrapper functions, and compile a
> special version of libpq. Anyway, I came up with the attached test program.
> It hangs without your fixes, and works with the patches applied. Thoughts?

That's not going to work on OSX/BSD (as those callbacks are not
present) and Windows, but testing that on Linux only looks fine to me,
and locating it where it is, aka in a non-default compile path is
definitely better. You should add .gitignore in
src/interfaces/libpq/test/ (uri-regress is missing as well btw).
Regards,
--
Michael

Attachment Content-Type Size
0001-Fix-OOM-error-handling-in-COPY-protocol-of-libpq.patch text/x-diff 6.5 KB
0002-Fix-OOM-error-handling-in-BIND-protocol-of-libpq.patch text/x-diff 3.6 KB
0003-Remove-dead-code-of-libpq-protocol-3.patch text/x-diff 4.8 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2015-09-19 04:32:52 Re: PQexec() hangs on OOM
Previous Message Heikki Linnakangas 2015-09-18 17:55:38 Re: PQexec() hangs on OOM