Re: SSL renegotiation and other related woes

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSL renegotiation and other related woes
Date: 2015-02-11 23:41:37
Message-ID: 20150211234137.GT21017@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
> On 01/26/2015 12:14 PM, Andres Freund wrote:
> >Hi,
> >
> >When working on getting rid of ImmediateInterruptOK I wanted to verify
> >that ssl still works correctly. Turned out it didn't. But neither did it
> >in master.
> >
> >Turns out there's two major things we do wrong:
> >
> >1) We ignore the rule that once called and returning
> > SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
> > again with the same parameters. Unfortunately that rule doesn't mean
> > just that the same parameters have to be passed in, but also that we
> > can't just constantly switch between _read()/write(). Especially
> > nonblocking backend code (i.e. walsender) and the whole frontend code
> > violate this rule.
>
> I don't buy that. Googling around, and looking at the man pages, I just
> don't see anything to support that claim. I believe it is supported to
> switch between reads and writes.

There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.

> >2) We start renegotiations in be_tls_write() while in nonblocking mode,
> > but don't properly retry to handle socket readyness. There's a loop
> > that retries handshakes twenty times (???), but what actually is
> > needed is to call SSL_get_error() and ensure that there's actually
> > data available.
>
> Yeah, that's just crazy.
>
> Actually, I don't think we need to call SSL_do_handshake() at all. At least
> on modern OpenSSL versions. OpenSSL internally uses a state machine, and
> SSL_read() and SSL_write() calls will complete the handshake just as well.

Some blaming seems to show that that's been the case for a long time.

> >2) is easy enough to fix, but 1) is pretty hard. Before anybody says
> >that 1) isn't an important rule: It reliably causes connection aborts
> >within a couple renegotiations. The higher the latency the higher the
> >likelihood of aborts. Even locally it doesn't take very long to
> >abort. Errors usually are something like "SSL connection has been closed
> >unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
> >of other similar messages. There's a couple reports of those in the
> >archives and I've seen many more in client logs.
>
> Yeah, I can reproduce that. There's clearly something wrong.

> I believe this is an OpenSSL bug. I traced the "unexpected message" error to
> this code in OpenSSL's s3_read_bytes() function:

Yep, I got to that as well.

> > case SSL3_RT_APPLICATION_DATA:
> > /*
> > * At this point, we were expecting handshake data, but have
> > * application data. If the library was running inside ssl3_read()
> > * (i.e. in_read_app_data is set) and it makes sense to read
> > * application data at this point (session renegotiation not yet
> > * started), we will indulge it.
> > */
> > if (s->s3->in_read_app_data &&
> > (s->s3->total_renegotiations != 0) &&
> > (((s->state & SSL_ST_CONNECT) &&
> > (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
> > (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
> > ) || ((s->state & SSL_ST_ACCEPT) &&
> > (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
> > (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
> > )
> > )) {
> > s->s3->in_read_app_data = 2;
> > return (-1);
> > } else {
> > al = SSL_AD_UNEXPECTED_MESSAGE;
> > SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
> > goto f_err;
> > }
>
> So that quite clearly throws an error, *unless* the original application
> call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
> SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
> fault and it should "indulge" the application data message even in
> SSL_write().

That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.

> In theory, I guess we should do similar hacks in the server, and always call
> SSL_read() before SSL_write(), but it seems to work without it. Or maybe
> not; OpenSSL server and client code is not symmetric, so perhaps it works in
> server mode without that.

I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-12 00:03:40 Re: libpq's multi-threaded SSL callback handling is busted
Previous Message Andres Freund 2015-02-11 23:33:39 Re: SSL renegotiation and other related woes