Cryptic error message in low-memory conditions

From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Cryptic error message in low-memory conditions
Date: 2011-08-26 20:57:29
Message-ID: CAAZKuFaxdPccCs9+2hTbMM5iwPZP6C494sAVz6qZhOPcYSGkTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello list,

This is something that I've only recently somewhat pinned down to a cause...

Some Postgres servers will error out for a while with the following
error message:

"expected authentication request from server, but received c"

If one uses Their Favorite Search Engine, this message is scattered
around the internet, all in reference to Postgres, I think, but none
of the top results seem to have any lucid responses or cause listed.
We've seen this reproduce -- sometimes for minutes at a time -- and
after catching one in the act I am reasonably confident that one
common cause of this is systems that are low on memory, which I
confirmed by looking at postgres logs and matching them up against our
monitoring system.

Critical statistics first: the systems run Linux with overcommit off,
so malloc returns NULL now and again. There is no OOM killer activity.
SSL is the transport, and SQL role password authentication is in use.
There is no swap.

Here's an example of the various kinds of failure one can get from
connecting to a system that is low on memory:

2011-08-26 16:03:06 | INFO "psql? failed with exception #<PGError:
FATAL: out of memory
DETAIL: Failed on request of size 488.
>"
2011-08-26 16:02:27 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 16:01:51 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 16:01:15 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 16:00:39 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 16:00:01 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 15:59:25 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 15:58:48 | INFO "psql? failed with exception #<PGError:
expected authentication request from server, but received c
>"
2011-08-26 15:58:12 | INFO "psql? failed with exception #<PGError:
FATAL: out of memory
>"

On the backend side, one can see that often there is a failure to
fork, which is basically expected in this condition. Various
statements will be reporting OOM also.

The commonality of an error message that does not say anything about
being out of memory is representative of the norm, and nominally one
does not get any express indication that the system is out of memory,
but otherwise responsive. This puts someone doing monitoring (like
us) in a tricky position: the utilizer of the database is free to use
their memory -- that's what it's for -- but the problem is we cannot
determine that the server is basically online, if fully utilized.
This defeats the ever-common "authenticate and run SELECT 1;" basic
monitoring style frequently used to determine the most basic levels of
uptime.

Should the 'out of memory' conditions were delivered most of the time
we could act differently, but for now we basically have to assume that
postgres is offline and poke around. It's also interesting to note
that the systems are basically responsive (ssh can always seem to
fork, as I'm poking around tools like 'ls' et al seem to be fine), and
sometimes the load average isn't even extreme -- a leaky application
with too many connections can cause this, so it's not like every tiny
last scrap of memory has been consumed.

--
fdr

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-26 21:02:35 Re: tsvector concatenation - backend crash
Previous Message Jim Nasby 2011-08-26 20:46:46 Re: pg_restore --no-post-data and --post-data-only