Re: WL_SOCKET_ACCEPT fairness on Windows

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WL_SOCKET_ACCEPT fairness on Windows
Date: 2023-04-01 01:35:29
Message-ID: 20230401013529.eyvh7jcaqxz4qn7o@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-01 13:42:21 +1300, Thomas Munro wrote:
> Commit 7389aad6 started using WaitEventSetWait() to wait for incoming
> connections. Before that, we used select(), for which we have our own
> implementation for Windows.
>
> While hacking on patches to rip a bunch of unused Win32 socket wrapper
> code out, I twigged that the old pgwin32_select() code was careful to
> report multiple sockets at once by brute force polling of all of them,
> while WaitEventSetWait() doesn't do that: it reports just one event,
> because that's what the Windows WaitForMultipleEvents() syscall does.
> I guess this means you can probably fill up the listen queue of server
> socket 1 to prevent server socket 2 from ever being serviced, whereas
> on Unix we'll accept one connection at a time from each in round-robin
> fashion.

It does indeed sound like it'd behave that way:

> If bWaitAll is FALSE, the return value minus WAIT_OBJECT_0 indicates the
> lpHandles array index of the object that satisfied the wait. If more than
> one object became signaled during the call, this is the array index of the
> signaled object with the smallest index value of all the signaled objects.

I wonder if we ought to bite the bullet and replace the use of
WaitForMultipleObjects() with RegisterWaitForSingleObject() and then use
GetQueuedCompletionStatus() to wait. The fairness issue here is a motivation,
but the bigger one is that that'd get us out from under the
MAXIMUM_WAIT_OBJECTS (IIRC 64) limit. Afaict that'd also allow us to read
multiple notifications at once, using GetQueuedCompletionStatusEx().

Medium term that'd also be a small step towards using readiness based APIs in
a few places...

> I think we could get the same effect as pgwin32_select() more cheaply
> by doing the initial WaitForMultipleEvents() call with the caller's
> timeout exactly as we do today, and then, while we have space,
> repeatedly calling
> WaitForMultipleEvents(handles=&events[last_signaled_event_index + 1],
> timeout=0) until it reports timeout.

That would make sense, and should indeed be reasonable cost-wise.

> I mention this now because I'm not sure whether to consider this an
> 'open item' for 16, or merely an enhancement for 17. I guess the
> former, because someone might call that a new denial of service
> vector. On the other hand, if you fill up the listen queue for socket
> 1 with enough vigour, you're also denying service to socket 1, so I
> don't know if it's worth worrying about. Opinions on that?

I'm not sure either. It doesn't strike me as a particularly relevant
bottleneck. And the old approach of doing more work for every single
connection also made many connections worse, I think?

> diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
> index f4123e7de7..cc7b572008 100644
> --- a/src/backend/storage/ipc/latch.c
> +++ b/src/backend/storage/ipc/latch.c
> @@ -2025,6 +2025,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
> */
> cur_event = (WaitEvent *) &set->events[rc - WAIT_OBJECT_0 - 1];
>
> +loop:
> +

As far as I can tell, we'll never see WL_LATCH_SET or WL_POSTMASTER_DEATH. I
think it'd look cleaner to move the body of if (cur_event->events & WL_SOCKET_MASK)
into a separate function that we then also can call further down.

> occurred_events->pos = cur_event->pos;
> occurred_events->user_data = cur_event->user_data;
> occurred_events->events = 0;
> @@ -2044,6 +2046,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
> occurred_events->events = WL_LATCH_SET;
> occurred_events++;
> returned_events++;
> + nevents--;
> }
> }
> else if (cur_event->events == WL_POSTMASTER_DEATH)
> @@ -2063,6 +2066,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
> occurred_events->events = WL_POSTMASTER_DEATH;
> occurred_events++;
> returned_events++;
> + nevents--;
> }
> }
> else if (cur_event->events & WL_SOCKET_MASK)
> @@ -2124,6 +2128,36 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
> {
> occurred_events++;
> returned_events++;
> + nevents--;
> + }
> + }

Seems like we could use returned_events to get nevents in the way you want it,
without adding even more ++/-- to each of the different events?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-04-01 03:00:04 Re: WL_SOCKET_ACCEPT fairness on Windows
Previous Message houzj.fnst@fujitsu.com 2023-04-01 01:24:05 RE: Non-superuser subscription owners