Re: OOM in libpq and infinite loop with getCopyStart()

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: OOM in libpq and infinite loop with getCopyStart()
Date: 2016-03-09 15:12:34
Message-ID: 20160309151234.GA963998@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Aleksander Alekseev wrote:

> pg_receivexlog: could not send replication command "START_REPLICATION":
> out of memory pg_receivexlog: disconnected; waiting 5 seconds to try
> again pg_receivexlog: starting log streaming at 0/1000000 (timeline 1)
>
> Breakpoint 1, getCopyStart (conn=0x610180, copytype=PGRES_COPY_BOTH,
> msgLength=3) at fe-protocol3.c:1398 1398 const char
> *errmsg = NULL;
> ```
>
> Granted this behaviour is a bit better then the current one. But
> basically it's the same infinite loop only with pauses and warnings. I
> wonder if this is a behaviour we really want. For instance wouldn't it
> be better just to terminate an application in out-of-memory case? "Let
> it crash" as Erlang programmers say.

Hmm. It would be useful to retry in the case that there is a chance
that the program releases memory and can continue later. But if it will
only stay there doing nothing other than retrying, then that obviously
will not happen. One situation where this might help is if the overall
*system* is short on memory and we expect that situation to resolve
itself after a while -- after all, if the system is so loaded that it
can't allocate a few more bytes for the COPY message, then odds are that
other things are also crashing and eventually enough memory will be
released that pg_receivexlog can continue.

On the other hand, if the system is so loaded, perhaps it's better to
"let it crash" and have it restart later -- presumably once the admin
notices the problem and restarts it manually after cleaning up the mess.

If all programs are well behaved and nothing crashes when OOM but they
all retry instead, then everything will continue to retry infinitely and
make no progress. That cannot be good.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-03-09 15:31:32 Re: the include-timestamp data returned by pg_logical_slot_peek_changes() is always 2000-01-01 in 9.5.1
Previous Message Magnus Hagander 2016-03-09 15:08:37 Re: Crash with old Windows on new CPU