Re: pgbench stopped supporting large number of client connections on Windows

From: Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench stopped supporting large number of client connections on Windows
Date: 2020-11-08 14:59:19
Message-ID: e5a09b790db21356376b6e73673aa07c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-11-07 01:01, Fabien COELHO wrote:
> Hello Marina,

Hello, Fabien!

Thank you for your comments!

>> While trying to test a patch that adds a synchronization barrier in
>> pgbench [1] on Windows,
>
> Thanks for trying that, I do not have a windows setup for testing, and
> the sync code I wrote for Windows is basically blind coding:-(

FYI:

1) It looks like pgbench will no longer support Windows XP due to the
function DeleteSynchronizationBarrier. From
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier
:

Minimum supported client: Windows 8 [desktop apps only]
Minimum supported server: Windows Server 2012 [desktop apps only]

On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1]
has compiled without (new) warnings, but when running pgbench I got the
following error:

The procedure entry point DeleteSynchronizationBarrier could not be
located in the dynamic link library KERNEL32.dll.

2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1]
with fix_max_client_conn_on_Windows.patch has compiled without (new)
warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with
and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1
-S) your patches fix problems with progress reports as in [2], but on
Windows I did not notice such changes, see attached
pgbench_runs_linux_vs_windows.zip.

>> The almost same thing happens with reindexdb and vacuumdb (build on
>> commit [3]):
>
> Windows fd implementation is somehow buggy because it does not return
> the smallest number available, and then with the assumption that
> select uses a dense array indexed with them (true on linux, less so on
> Windows which probably uses a sparse array), so that the number gets
> over the limit, even if less are actually used, hence the catch, as
> you noted.

I agree with you. It looks like the structure fd_set just contains used
sockets by this application on Windows, and the macro FD_SETSIZE is used
only here.

From
https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set
:

typedef struct fd_set {
u_int fd_count;
SOCKET fd_array[FD_SETSIZE];
} fd_set, FD_SET, *PFD_SET, *LPFD_SET;

From
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:

The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.

>> IIUC the checks below are not correct on Windows, since on this system
>> sockets can have values equal to or greater than FD_SETSIZE (see
>> Windows documentation [4] and pgbench debug output in attached
>> pgbench_debug.txt).
>
> Okay.
>
> But then, how may one detect that there are too many fds in the set?
>
> I think that an earlier version of the code needed to make assumptions
> about the internal implementation of windows (there is a counter
> somewhere in windows fd_set struct), which was rejected because if was
> breaking the interface. Now your patch is basically resurrecting that.

I tried to keep the behaviour "we check if the socket value can be used
in select() at runtime", but now I will also read that thread...

> Why not if there is no other solution, but this is quite depressing,
> and because it breaks the interface it would be broken if windows
> changed its internals for some reason:-(

It looks like if the internals of the structure fd_set are changed, we
will also have problems with the function pgwin32_select from
src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?..

(I'm writing responses to the rest of your comments but it takes
time...)

[1]
https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2011021726390.989361%40pseudo
[2]
https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
pgbench_runs_linux_vs_windows.zip application/zip 15.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-11-08 15:47:17 Re: Yet another (minor) fix in BRIN
Previous Message Andy Fan 2020-11-08 14:52:43 Re: Hybrid Hash/Nested Loop joins and caching results from subplans