Re: pgbench: could not connect to server: Resource temporarily unavailable

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Kevin McKibbin <kevinmckibbin123(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: pgbench: could not connect to server: Resource temporarily unavailable
Date: 2022-08-21 21:15:01
Message-ID: 8864.1661116501@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 2022-08-20 Sa 23:20, Tom Lane wrote:
>> Kevin McKibbin <kevinmckibbin123(at)gmail(dot)com> writes:
>>> What's limiting my DB from allowing more connections?

> The first question in my mind from the above is where this postgres
> instance is actually listening. Is it really /var/run/postgresql? Its
> postmaster.pid will tell you. I have often seen client programs pick up
> a system libpq which is compiled with a different default socket directory.

I wouldn't think that'd explain a symptom of some connections succeeding
and others not within the same pgbench run.

I tried to duplicate this behavior locally (on RHEL8) and got something
interesting. After increasing the server's max_connections to 1000,
I can do

$ pgbench -S -c 200 -j 100 -t 100 bench

and it goes through fine. But:

$ pgbench -S -c 200 -j 200 -t 100 bench
pgbench (16devel)
starting vacuum...end.
pgbench: error: connection to server on socket "/tmp/.s.PGSQL.5440" failed: Resource temporarily unavailable
Is the server running locally and accepting connections on that socket?
pgbench: error: could not create connection for client 154

So whatever is triggering this has nothing to do with the server,
but with how many threads are created inside pgbench. I notice
also that sometimes it works, making it seem like possibly a race
condition. Either that or there's some limitation on how fast
threads within a process can open sockets.

Also, I determined that libpq's connect() call is failing synchronously
(we get EAGAIN directly from the connect() call, not later). I wondered
if libpq should accept EAGAIN as a synonym for EINPROGRESS, but no:
that just makes it fail on the next touch of the socket.

The only documented reason for connect(2) to fail with EAGAIN is

EAGAIN Insufficient entries in the routing cache.

which seems pretty unlikely to be the issue here, since all these
connections are being made to the same local address.

On the whole this is smelling more like a Linux kernel bug than
anything else.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Dunstan 2022-08-21 21:26:55 Re: pgbench: could not connect to server: Resource temporarily unavailable
Previous Message Andrew Dunstan 2022-08-21 20:18:47 Re: pgbench: could not connect to server: Resource temporarily unavailable