Re: [patch] helps fe-connect.c handle -EINTR more gracefully

From: David Ford <david(at)blue-labs(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] helps fe-connect.c handle -EINTR more gracefully
Date: 2001-10-27 00:15:30
Message-ID: 3BD9FCA2.90903@blue-labs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>
>
>No, it should *not* look like that. The fe-connect.c code is designed
>to move on as soon as it's convinced that the kernel has accepted the
>connection request. We use a non-blocking connect() call and later
>wait for connection complete by probing the select() status. Looping
>on the connect() itself would be a busy-wait, which would be antisocial.
>

The fe-connect.c code moves on regardless of the completion of the
connect() if it has been interrupted.

To simplify, in a program without SIGALRM events, PQconnect* won't be
interrupted. The connect() call will complete properly.

In a program with SIGALRM events, the call is interrupted inside
connect(). If SA_RESTART was disabled for connect() in POSIX semantics,
the program would automatically jump right back into the connect()
call. However by default POSIX code enables SA_RESTART which for
SIGALRM means -don't- automatically restart the system call. This means
the programmer needs to check for -1/errno=EINTR and jump back into
connect() himself. There isn't a concern for busy wait/anti social code
behavior, your program was in the middle of connect() when it was
interrupted, you're simply jumping back to where you left off.

It doesn't matter if it is a blocking connect or non-blocking connect,
handling EINTR must be done if SIGALRM events are employed. A fast
enough event timer with a non-blocking connect will also be susceptible
to EINTR.

EINTR is distinctly different from EINPROGRESS. If they were the same
then there would be a problem. EINTR should be handled by jumping back
into the connect() call, it is re-entrant and designed for this.

Regardless, you don't wait for the connection to complete, the code
following the connect() call returns failure for every -1 result from
connect() unless it is EINPROGRESS or EWOULDBLOCK. select() is -not-
used in fe-connect.c. It is possible with the current code for the
connection to fail in non-blocking mode. Reason: you call connect() in
non-blocking mode, break out of the section on EINPROGRESS, and continue
assuming that the connection will be successful.

EINPROGRESS
The socket is non-blocking and the connection can
not be completed immediately. It is possible to
select(2) or poll(2) for completion by selecting
the socket for writing. After select indicates
writability, use getsockopt(2) to read the SO_ERROR
option at level SOL_SOCKET to determine whether
connect completed successfully (SO_ERROR is zero)
or unsuccessfully (SO_ERROR is one of the usual
error codes listed here, explaining the reason for
the failure).

The socket is not checked any further after the connect(). The code
should not continue on into the SSL handling until you're sure that the
socket is ready for operation.

The reason why I am getting EINTR from a non-blocking connect is because
my event timer happens to fire in the middle of the connect() call.
Just because you set the socket to FIONBIO doesn't mean that connect()
can't be interrupted.

David

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 2001-10-27 00:32:51 Re: 7.2b1 ...
Previous Message Bruce Momjian 2001-10-26 23:59:12 configure --enable-unicode