Re: OOM in libpq and infinite loop with getCopyStart()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OOM in libpq and infinite loop with getCopyStart()
Date: 2016-04-01 03:48:32
Message-ID: 25412.1459482512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I thought about this patch a bit more...

When I first looked at the patch, I didn't believe that it worked at
all: it failed to return a PGRES_COPY_XXX result to the application,
and yet nonetheless switched libpq's asyncStatus to PGASYNC_COPY_XXX?
Wouldn't things be hopelessly confused? I tried it out and saw that
indeed it seemed to work in psql, and after tracing through that found
that psql has no idea what's going on, but when psql issues its next
command PQexecStart manages to get us out of the copy state (barring
more OOM failures, anyway). That seems a bit accidental, though,
and for sure it wasn't documented in the patch. In any case, having
libpq in that state would have side-effects on how it responds to
application actions, which is the core of my complaint about
PQgetResult behavior. Those side effects only make sense if the app
knows (or should have known) that the connection is in COPY state.

I think it might be possible to get saner behavior if we invent a
set of new asyncStatus values PGASYNC_COPY_XXX_FAILED, having the
semantics that we got the relevant CopyStart message from the backend
but were unable to report it to the application. PQexecStart would
treat these the same as the corresponding normal PGASYNC_COPY_XXX
states and do what's needful to get out of the copy mode. But in
PQgetResult, we would *not* treat those states as a reason to return a
PGRES_COPY_XXX result to the application. Probably we'd just return
NULL, with the implication being "we're done so far as you're concerned,
please give a new command". I might be underthinking it though.

Anyway, the short of my review is that we need more clarity of thought
about what state libpq is in after a failure like this, and what that
state looks like to the application, and how that should affect how
libpq reacts to application calls.

I'll set this back to Waiting on Author ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-04-01 03:49:31 Re: IF (NOT) EXISTS in psql-completion
Previous Message Amit Kapila 2016-04-01 03:31:46 Re: Parallel Queries and PostGIS