Re: errors with high connections rate

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: "Pawel S(dot) Veselov" <pawel(dot)veselov(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: errors with high connections rate
Date: 2012-07-03 13:27:24
Message-ID: 4FF2F33C.5070105@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/03/2012 04:26 PM, Pawel S. Veselov wrote:

> That's the thing, no segfaults (dmesg), nothing in the server logs.
>
> It may as well be some sort of an anti-fork-bomb measure, only judging
> by the fact that with enough attempts, things do clear out, though I
> wish there would be some indication of that, and I'm still confused
> about the error code being ENOTCONN.
>

I've managed to produce the endpoint not connected errors with a little
test I wrote here. Only once so far and only during an abnormal test run
where I signalled the test workers as they were starting up, so that's
not really very helpful.

I have no problem using a little Python test program to create 800
connections in about a second. It forks some workers (100 by default)
which grab enough connections each to reach the target connection count.

Ooh, handy. I just triggered it again now. The "Transport endpoint is
not connected" messages were intermixed with some "FATAL: sorry, too
many clients already" messages. The PostgreSQL log is full of FATAL:
sorry, too many clients already" messages intermixed with "LOG:
unexpected EOF on client connection" messages. Again it was an abnormal
run where I signalled my workers mid way through startup.

Interesting, that. I've never seen it on a run where I don't send a
signal. You know what that makes me think? You're using a multithreaded
approach, and there's something going wrong in your app's innards. Yes,
that's a lot of hot air and handwaving, but it fits - you're getting an
error saying that psql is trying to operate on a socket that isn't there.

The fact that there's nothing in the system logs or Pg logs just adds
weight to that. I'm guessing you have a threading bug, possibly signal
related.

--
Craig Ringer

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-07-03 13:32:23 Re: Where should I start for learn development
Previous Message Chris Angelico 2012-07-03 13:23:03 Re: Where should I start for learn development