Re: PQexec() hangs on OOM

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PQexec() hangs on OOM
Date: 2015-09-07 13:41:34
Message-ID: CAB7nPqSBwLkJdj2VV=czMUERuE+TKyNmu3bCbfYr2YFyhOUVMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Sep 7, 2015 at 6:45 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Sep 7, 2015 at 1:40 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
> wrote:
>>
>> On Sat, Sep 5, 2015 at 9:45 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >
>>
>> So, I think that the right approach would be to leave immediately
>> pqParseInput3 should an error happen, switching asyncStatus to leave
>> the loop in PQgetResult.
>
> Sure, if you think that works, then do it that way.
>
>> This seems as well to work correctly with
>> PGRES_COPY_BOTH (I emulated failures with pg_basebackup and errors
>> were caught up correctly. This brings back of course the introduction
>> of a new flag PGASYNC_FATAL
>
> I think we should try not to introduce this new flag, as that is not merely
> a flag, but rather a state in query-execution state machine. Now if we
> introduce a new error state into that state-machine, then it doesn't seem
> like a good idea to do that only for some of the paths and doing it for all
> other paths is a call for somewhat larger verification cycle (to see if it
> works in all paths as previously).

Well, the previous patch you sent added code paths to manage the
failures with COPY protocol directly in the COPY routines in a way
similar to what the introduction of PGASYNC_FATAL is doing, except
that PGASYNC_FATAL has the advantage to let the end of PQgetResult
know that a failure has happened, centralizing the error check. Also,
it seems to me that switching getParamDescriptions to an error state
is going to need this flag, or at least a new flag to exit the loop
when parsing the input.

Regarding the other messages that could use this flag, I just
double-checked and it seems that all the other calls return an EOF
when they do not have enough data received from the backend or they
directly switch to another PGASYNC state, so they do not need a
specific fatal error handling. Please let me know if I missed
something.

In any case, attached are two patches:
- 0001 adds the OOM/failure handling for the BIND and COPY start
messages. This time the connection is not dropped. After a failure,
successive commands work as well, this addresses the previous issue
you reported.
- 0002 is a cleanup bonus, getRowDescriptions and getAnotherTuple have
some dead code that I think would be better removed, those are
remnants from a copy/paste from the similar code of protocol 2.
Regards,
--
Michael

Attachment Content-Type Size
0001-Prevent-COPY-start-and-BIND-to-freeze-in-libpq-on-OO.patch binary/octet-stream 10.2 KB
0002-Remove-dead-code-of-libpq-protocol-3.patch binary/octet-stream 4.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2015-09-07 15:13:15 Re: PQexec() hangs on OOM
Previous Message Amit Kapila 2015-09-07 09:45:04 Re: PQexec() hangs on OOM