Re: PQexec() hangs on OOM

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2016-02-06 12:11:58
Message-ID: CAB7nPqRp=RLfaeV95cu1wAChgmiFF7GsB5EZZPBHTRq9a2ty8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-bugs

On Tue, Dec 15, 2015 at 3:54 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>
>
> On Tue, Dec 15, 2015 at 2:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>> On Mon, Dec 14, 2015 at 10:41 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
>> wrote:
>>> On 16/10/15 05:00, Michael Paquier wrote:
>>> I don't immediately see why that's needed. I ran the little tester
>>> program I wrote earlier
>>> (http://www.postgresql.org/message-id/55FC501A.5000409@iki.fi), with and
>>> without that change, and it behaved the same.
>>>
>>
>> Without this change, it can ignore the OOM error for copy command.
>> As an example, for command like "copy t1 from stdin;", when the
>> flow reaches getCopyStart() in debugger, I have manually ensured
>> that PQmakeEmptyPGresult() return NULL by overwriting the return
>> value of malloc to zero and then it enters the error path
>> (advance_and_error) and then I just press continue and what I observed
>> is without above change it just doesn't show OOM error and with the
>> above change it properly reports OOM error.
>
> Yeah, that's basically this kind of trace:
> * frame #0: 0x0000000100e8edad
> libpq.5.dylib`getCopyStart(conn=0x00007fb238404090, copytype=PGRES_COPY_OUT,
> msgLength=5) + 29 at fe-protocol3.c:1400
> frame #1: 0x0000000100e8d3d6
> libpq.5.dylib`pqParseInput3(conn=0x00007fb238404090) + 2374 at
> fe-protocol3.c:383
> frame #2: 0x0000000100e7f5ad
> libpq.5.dylib`parseInput(conn=0x00007fb238404090) + 45 at fe-exec.c:1652
> frame #3: 0x0000000100e7f86b
> libpq.5.dylib`PQgetResult(conn=0x00007fb238404090) + 251 at fe-exec.c:1727
> frame #4: 0x0000000100e7fdad
> libpq.5.dylib`PQexecFinish(conn=0x00007fb238404090) + 29 at fe-exec.c:2007
> frame #5: 0x0000000100e7fbdc
> libpq.5.dylib`PQexec(conn=0x00007fb238404090, query=0x00007fb23841b760) + 92
> at fe-exec.c:1838
> frame #6: 0x0000000100dd2f35 psql`SendQuery(query=0x00007fb23841b760) +
> 965 at common.c:1064
> frame #7: 0x0000000100dd99f4 psql`MainLoop(source=0x00007fff74944a90) +
> 2388 at mainloop.c:292
> frame #8: 0x0000000100de1988 psql`main(argc=1, argv=0x00007fff5ee36510)
> + 3304 at startup.c:390
>
> If an error occurs in the COPY start message, this allows to fetch correctly
> the error that is generated by getCopyResult(), and that what has been
> written to conn->errorMessage does not get overwritten by subsequent
> messages, allowing to output the correct report to the user.

Heikki and I had a short talk about this issue in Moscow, and Heikki
has mentioned me that he cannot convince himself that this would be a
sane change for the COPY protocol. Based on his experience with those
things, I'd rather put every back on the table and look again at this
issue with fresh eyes with more testing involving in-core and external
tools that use the COPY protocol with libpq routines. I recall having
doing some testing with pg_receivexlog and pg_recvlogical but I'll
look at that again. Also, I know one tool which would be interesting
to test: pg_bulkload. Do you know of any other things worth looking
at? We want things that link with libpq directly.

So, in short, I am planning to do more testing, even if it has proved
to be rather clear that this change has the advantage to make the
error stack to be clearer. But we may be missing something, so let's
see.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2016-02-06 13:03:15 Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Previous Message jeffrey.osborn 2016-02-06 00:41:59 BUG #13928: Initdb.bat will not install if path has a space.