pgbench stopped supporting large number of client connections on Windows

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pgbench stopped supporting large number of client connections on Windows
Date: 2020-11-06 20:34:53
Message-ID: 8225e78650dd69f69c8cff37ecce9a09@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers!

While trying to test a patch that adds a synchronization barrier in
pgbench [1] on Windows, I found that since the commit "Use ppoll(2), if
available, to wait for input in pgbench." [2] I cannot use a large
number of client connections in pgbench on my Windows virtual machines
(Windows Server 2008 R2 and Windows 2019), for example:

> bin\pgbench.exe -c 90 -S -T 3 postgres
starting vacuum...end.
too many client connections for select()

The almost same thing happens with reindexdb and vacuumdb (build on
commit [3]):

> bin\reindexdb.exe -j 95 postgres
reindexdb: fatal: too many jobs for this platform -- try 90

> bin\vacuumdb.exe -j 95 postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: fatal: too many jobs for this platform -- try 90

IIUC the checks below are not correct on Windows, since on this system
sockets can have values equal to or greater than FD_SETSIZE (see Windows
documentation [4] and pgbench debug output in attached
pgbench_debug.txt).

src/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
/*
* Doing a hard exit here is a bit grotty, but it doesn't seem worth
* complicating the API to make it less grotty.
*/
pg_log_fatal("too many client connections for select()");
exit(1);
}

src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup:
/*
* Fail and exit immediately if trying to use a socket in an
* unsupported range.  POSIX requires open(2) to use the lowest
* unused file descriptor and the hint given relies on that.
*/
if (PQsocket(conn) >= FD_SETSIZE)
{
pg_log_fatal("too many jobs for this platform -- try %d", i);
exit(1);
}

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch
(based on commit [3]). I checked it for reindexdb and vacuumdb, and it
works for simple databases (1025 jobs are not allowed and 1024 jobs is
ok). Unfortunately, pgbench was getting connection errors when it tried
to use 1000 jobs on my virtual machines, although there were no errors
for fewer jobs (500) and the same number of clients (1000)...

Any suggestions are welcome!

[1]
https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de
[2]
https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9
[3]
https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8
[4] From
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select
:
Internally, socket handles in an fd_set structure are not represented as
bit flags as in Berkeley Unix. Their data representation is opaque.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
fix_max_client_conn_on_Windows.patch text/x-diff 3.5 KB
pgbench_debug.txt text/plain 4.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2020-11-06 20:37:48 Re: Support for NSS as a libpq TLS backend
Previous Message Justin Pryzby 2020-11-06 17:57:33 Re: bitmaps and correlation