Re: pgbench bug / limitation

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "Jawarilal, Manish" <Manish(dot)Jawarilal(at)dell(dot)com>
Subject: Re: pgbench bug / limitation
Date: 2020-06-16 04:54:44
Message-ID: alpine.DEB.2.22.394.2006150902260.646816@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


Hello David,

>> I suggest that we might as well get all the way in and dodge the
>> FD_SETSIZE limitation altogether, as per the attached utterly-untested
>> draft patch.
>
> I compiled this on Visual Studio 2017 and tested it. I didn't
> encounter any problems.
>
> The only thing I see in Winsock2.h that relies on FD_SETSIZE being the
> same size as the fd_set array is the FD_SET macro. So, I think it
> should be safe if these differ, like they will with your patch. We'll
> just need to make sure we don't use FD_SET in the future.
>
>> A remaining problem with this is that in theory, repeatedly applying
>> socket_has_input() after the wait could also be O(N^2) (unless FD_ISSET
>> is way smarter than I suspect it is).
>
> The FD_ISSET() just calls a function, so I don't know what's going on
> under the hood.
>
> #define FD_ISSET(fd, set) __WSAFDIsSet((SOCKET)(fd), (fd_set FAR *)(set))
>
> However, I don't see what else it could do other than loop over the
> array until it finds a match.

Independently of the wisdom of handling many client connections with just
one pgbench thread, I'm really wondering what goes on under the hood on
windows implementation of select().

AFAICS from online docs, windows native interfaces for waiting on IOs are:

- WaitForMultipleObjects
with a MAXIMUM_WAIT_OBJECTS limit which is 64.

- WSAWaitForMultipleEvents
with a WSA_MAXIMUM_WAIT_EVENTS limit which is 64.

Then their (strange) implementation of POSIX select uses FD_SETSIZE which
is, you may have guessed, 64.

Although this is consistent, M$ doc indeeds suggest that FD_SETSIZE can be
extended, but then why would the underlying implementation do a better job
(handle more fds) than the native implementations? How can one really tell
that "it works"? Maybe it just waited for the first few objects? Maybe it
did some active scan on objects to check for their status? Maybe it forked
threads to do the waiting? Maybe something else?

Having some idea of what is really happening would help to know what is
best to do in pgbench.

I'd suggest the following test, with pgbench compiled with the extended
FD_SETSIZE for windows:

script sleep.sql:
SELECT pg_sleep(150 - :client_id);

The run something like the following under some debugger:

pgbench -c 128 -f "sleep.sql"

and look at where the process is when interrupted under select? Now I
cannot run this test, because I do not have access to a windows host.

--
Fabien.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message baki baki 2020-06-16 09:13:24 Re: BUG #16488: psql installation initdb
Previous Message Thomas Munro 2020-06-15 21:39:11 Re: Potential G2-item cycles under serializable isolation