Re: OOM in libpq and infinite loop with getCopyStart()

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OOM in libpq and infinite loop with getCopyStart()
Date: 2016-04-06 06:09:13
Message-ID: CAB7nPqRVr=c6pM8tX849io1+CcqvCq2+8X5skJvh26+8xV_5tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 2, 2016 at 12:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> So the core of my complaint is that we need to fix things so that, whether
>> or not we are able to create the PGRES_FATAL_ERROR PGresult (and we'd
>> better consider the behavior when we cannot), ...
>
> BTW, the real Achilles' heel of any attempt to ensure sane behavior at
> the OOM limit is this possibility of being unable to create a PGresult
> with which to inform the client that we failed.
>
> I wonder if we could make things better by keeping around an emergency
> backup PGresult struct. Something along these lines:
>
> 1. Add a field "PGresult *emergency_result" to PGconn.
>
> 2. At the very beginning of any PGresult-returning libpq function, check
> to see if we have an emergency_result, and if not make one, ensuring
> there's room in it for a reasonable-size error message; or maybe even
> preload it with "out of memory" if we assume that's the only condition
> it'll ever be used for. If malloc fails at this point, just return NULL
> without doing anything or changing any libpq state. (Since a NULL result
> is documented as possibly caused by OOM, this isn't violating any API.)
>
> 3. Subsequent operations never touch the emergency_result unless we're
> up against an OOM, but it can be used to return a failure indication
> to the client so long as we leave libpq in a state where additional
> calls to PQgetResult would return NULL.
>
> Basically this shifts the point where an unreportable OOM could happen
> from somewhere in the depths of libpq to the very start of an operation,
> where we're presumably in a clean state and OOM failure doesn't leave
> us with a mess we can't clean up.

I have moved this patch to next CF for the time being. As that's a
legit bug and not a feature, that should be fine to pursue work on
this item even if this CF ends.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-06 06:11:15 Re: WAL logging problem in 9.4.3?
Previous Message Michael Paquier 2016-04-06 06:05:23 Re: Support for N synchronous standby servers - take 2