Re: OOM in libpq and infinite loop with getCopyStart()

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OOM in libpq and infinite loop with getCopyStart()
Date: 2016-03-09 14:54:01
Message-ID: 20160309175401.590c2f30@fujitsu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Michael

Thanks a lot for steps to reproduce you provided.

I tested your path on Ubuntu Linux 14.04 LTS (GCC 4.8.4) and FreeBSD
10.2 RELEASE (Clang 3.4.1). In both cases patch applies cleanly, there
are no warnings during compilation and all regression tests pass. A few
files are not properly pgindent'ed though:

```
diff --git a/src/interfaces/libpq/fe-exec.c
b/src/interfaces/libpq/fe-exec.c index c99f193..2769719 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -2031,7 +2031,7 @@ PQexecFinish(PGconn *conn)
conn->status == CONNECTION_BAD)
break;
else if ((conn->asyncStatus == PGASYNC_COPY_IN ||
- conn->asyncStatus == PGASYNC_COPY_OUT ||
+ conn->asyncStatus == PGASYNC_COPY_OUT ||
conn->asyncStatus == PGASYNC_COPY_BOTH) &&
result->resultStatus == PGRES_FATAL_ERROR)
break;
diff --git a/src/interfaces/libpq/fe-protocol3.c
b/src/interfaces/libpq/fe-protocol3.c index 21a1d9b..280ca16 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -49,9 +49,9 @@ static int getParamDescriptions(PGconn *conn, int
msgLength); static int getAnotherTuple(PGconn *conn, int msgLength);
static int getParameterStatus(PGconn *conn);
static int getNotify(PGconn *conn);
-static int getCopyStart(PGconn *conn,
- ExecStatusType copytype,
- int msgLength);
+static int getCopyStart(PGconn *conn,
+ ExecStatusType copytype,
+ int msgLength);
static int getReadyForQuery(PGconn *conn);
static void reportErrorPosition(PQExpBuffer msg, const char *query,
int loc, int encoding);
```

Indeed your patch solves issues you described. Still here is something
that concerns me:

```
$ gdb --args pg_receivexlog --verbose -D ./temp_data/

(gdb) b getCopyStart
Function "getCopyStart" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (getCopyStart) pending.
(gdb) r
Starting program: /usr/local/pgsql/bin/pg_receivexlog --verbose
-D ./temp_data/ [Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1". pg_receivexlog: starting log
streaming at 0/1000000 (timeline 1)

Breakpoint 1, getCopyStart (conn=0x610220, copytype=PGRES_COPY_BOTH,
msgLength=3) at fe-protocol3.c:1398 1398 const char
*errmsg = NULL; (gdb) n
1400 result = PQmakeEmptyPGresult(conn, copytype);
(gdb)
1401 if (!result)
(gdb) p result = 0
$1 = (PGresult *) 0x0
(gdb) c
Continuing.
pg_receivexlog: could not send replication command "START_REPLICATION":
out of memory pg_receivexlog: disconnected; waiting 5 seconds to try
again pg_receivexlog: starting log streaming at 0/1000000 (timeline 1)

Breakpoint 1, getCopyStart (conn=0x610180, copytype=PGRES_COPY_BOTH,
msgLength=3) at fe-protocol3.c:1398 1398 const char
*errmsg = NULL; (gdb) n
1400 result = PQmakeEmptyPGresult(conn, copytype);
(gdb) n
1401 if (!result)
(gdb) p result = 0
$2 = (PGresult *) 0x0
(gdb) c
Continuing.
pg_receivexlog: could not send replication command "START_REPLICATION":
out of memory pg_receivexlog: disconnected; waiting 5 seconds to try
again pg_receivexlog: starting log streaming at 0/1000000 (timeline 1)

Breakpoint 1, getCopyStart (conn=0x610180, copytype=PGRES_COPY_BOTH,
msgLength=3) at fe-protocol3.c:1398 1398 const char
*errmsg = NULL;
```

Granted this behaviour is a bit better then the current one. But
basically it's the same infinite loop only with pauses and warnings. I
wonder if this is a behaviour we really want. For instance wouldn't it
be better just to terminate an application in out-of-memory case? "Let
it crash" as Erlang programmers say.

Best regards,
Aleksander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias Kurz 2016-03-09 14:56:34 Alter or rename enum value
Previous Message Mithun Cy 2016-03-09 14:48:45 Explain [Analyze] produces parallel scan for select Into table statements.