Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only

From: Noah Misch <noah(at)leadboat(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
Date: 2023-10-11 03:40:26
Message-ID: 20231011034026.75@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 08, 2023 at 10:00:03PM -0700, David G. Johnston wrote:
> On Sun, Oct 8, 2023 at 9:10 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > I didn't think of any phrasing that clearly explained things without the
> > reader consulting the code. I considered these:

I'm leaning toward:

"socket file descriptor out of range for select(): %d" [style guide violation]

A true style guide purist might bury it in a detail message:

ERROR: socket file descriptor out of range: %d
DETAIL: select() accepts file descriptors from 0 to 1023, inclusive, in this build.
HINT: Try fewer concurrent database clients.

> > "socket file descriptor out of range: %d" [what range?]
> >
> Quick drive-by...but it seems that < 0 is a distinctly different problem
> than exceeding FD_SETSIZE. I'm unsure what would cause the former but the
> error for the later seems like:
>
> error: "Requested socket file descriptor %d exceeds connection limit of
> %d", fd, FD_SETSIZE-1
> hint: Reduce the requested number of concurrent connections
>
> In short, the concept of range doesn't seem applicable here. There is an
> error state at the max, and some invalid system state condition where the
> position within a set is somehow negative. These should be separated -
> with the < 0 check happening first.

I view it as: the range of select()-able file descriptors is [0,FD_SETSIZE).
Negative is out of range.

> And apparently this condition isn't applicable when dealing with jobs in
> connect_slot?

For both pgbench.c and parallel_slot.c, there are sufficient negative-FD
checks elsewhere in the file. Ideally, either both files would have redundant
checks, or neither file would. I didn't feel the need to mess with that part
of the status quo.

> Also, I see that for connections we immediately issue FD_SET
> so this check on the boundary of the file descriptor makes sense. But the
> remaining code in connect_slot doesn't involve FD_SET so the test for the
> file descriptor falling within its maximum seems to be coming out of
> nowhere. Likely this is all good, and the lack of symmetry is just due to
> the natural progressive development of the code, but it stands out to the
> uninitiated looking at this patch.

True. The approach in parallel_slot.c is to check the FD number each time it
opens a socket, after which its loop with FD_SET() doesn't recheck. That's a
bit more efficient than the pgbench.c way, because each file may call FD_SET()
many times per socket. Again, I didn't mess with that part of the status quo.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-10-11 03:40:28 Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Previous Message Andres Freund 2023-10-11 03:39:29 Re: stopgap fix for signal handling during restore_command