Re: backtrace_on_internal_error

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: backtrace_on_internal_error
Date: 2023-12-08 19:33:16
Message-ID: 20231208193316.5ylgs4zb6zngwyg4@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-12-08 10:51:01 -0800, Andres Freund wrote:
> On 2023-12-08 13:46:07 -0500, Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > On 2023-12-08 13:23:50 -0500, Tom Lane wrote:
> > >> Hmm, don't suppose you have a way to reproduce that?
> >
> > > After a bit of trying, yes. I put an abort() into pgtls_open_client(), after
> > > initialize_SSL(). Connecting does result in:
> > > LOG: could not accept SSL connection: Success
> >
> > OK. I can dig into that, unless you're already on it?
>
> I think I figured it it out. Looks like we need to translate a closed socket
> (recvfrom() returning 0) to ECONNRESET or such.

I think we might just need to expand the existing branch for EOF:

if (r < 0)
ereport(COMMERROR,
(errcode_for_socket_access(),
errmsg("could not accept SSL connection: %m")));
else
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
errmsg("could not accept SSL connection: EOF detected")));

The openssl docs say:

The following return values can occur:

0

The TLS/SSL handshake was not successful but was shut down controlled and by the specifications of the TLS/SSL protocol. Call SSL_get_error() with the return value ret to find out the reason.
1

The TLS/SSL handshake was successfully completed, a TLS/SSL connection has been established.
<0

The TLS/SSL handshake was not successful because a fatal error occurred either at the protocol level or a connection failure occurred. The shutdown was not clean. It can also occur if action is needed to continue the operation for nonblocking BIOs. Call SSL_get_error() with the return value ret to find out the reason.

Which fits with my reproducer - due to the abort the connection was *not* shut
down via SSL in a controlled manner, therefore r < 0.

Hm, oddly enough, there's this tidbit in the SSL_get_error() manpage:

On an unexpected EOF, versions before OpenSSL 3.0 returned SSL_ERROR_SYSCALL,
nothing was added to the error stack, and errno was 0. Since OpenSSL 3.0 the
returned error is SSL_ERROR_SSL with a meaningful error on the error stack.

But I reproduced this with 3.1.

Seems like we should just treat errno == 0 as a reason to emit the "EOF
detected" message?

I wonder if we should treat send/recv returning 0 different from an error
message perspective during an established connection. Right now we produce
could not receive data from client: Connection reset by peer

because be_tls_read() sets errno to ECONNRESET - despite that not having been
returned by the OS. But I guess that's a topic for another day.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2023-12-08 19:45:23 Re: Emitting JSON to file using COPY TO
Previous Message Andres Freund 2023-12-08 18:51:01 Re: backtrace_on_internal_error