From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: SSL renegotiation and other related woes |
Date: | 2015-02-11 23:41:37 |
Message-ID: | 20150211234137.GT21017@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2015-02-05 23:03:02 +0200, Heikki Linnakangas wrote:
> On 01/26/2015 12:14 PM, Andres Freund wrote:
> >Hi,
> >
> >When working on getting rid of ImmediateInterruptOK I wanted to verify
> >that ssl still works correctly. Turned out it didn't. But neither did it
> >in master.
> >
> >Turns out there's two major things we do wrong:
> >
> >1) We ignore the rule that once called and returning
> > SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
> > again with the same parameters. Unfortunately that rule doesn't mean
> > just that the same parameters have to be passed in, but also that we
> > can't just constantly switch between _read()/write(). Especially
> > nonblocking backend code (i.e. walsender) and the whole frontend code
> > violate this rule.
>
> I don't buy that. Googling around, and looking at the man pages, I just
> don't see anything to support that claim. I believe it is supported to
> switch between reads and writes.
There's at least two implementations that seem to have workaround to
achieve something like that. Apache's mod_ssl and the ssl layer for
libevent.
> >2) We start renegotiations in be_tls_write() while in nonblocking mode,
> > but don't properly retry to handle socket readyness. There's a loop
> > that retries handshakes twenty times (???), but what actually is
> > needed is to call SSL_get_error() and ensure that there's actually
> > data available.
>
> Yeah, that's just crazy.
>
> Actually, I don't think we need to call SSL_do_handshake() at all. At least
> on modern OpenSSL versions. OpenSSL internally uses a state machine, and
> SSL_read() and SSL_write() calls will complete the handshake just as well.
Some blaming seems to show that that's been the case for a long time.
> >2) is easy enough to fix, but 1) is pretty hard. Before anybody says
> >that 1) isn't an important rule: It reliably causes connection aborts
> >within a couple renegotiations. The higher the latency the higher the
> >likelihood of aborts. Even locally it doesn't take very long to
> >abort. Errors usually are something like "SSL connection has been closed
> >unexpectedly" or "SSL Error: sslv3 alert unexpected message" and a host
> >of other similar messages. There's a couple reports of those in the
> >archives and I've seen many more in client logs.
>
> Yeah, I can reproduce that. There's clearly something wrong.
> I believe this is an OpenSSL bug. I traced the "unexpected message" error to
> this code in OpenSSL's s3_read_bytes() function:
Yep, I got to that as well.
> > case SSL3_RT_APPLICATION_DATA:
> > /*
> > * At this point, we were expecting handshake data, but have
> > * application data. If the library was running inside ssl3_read()
> > * (i.e. in_read_app_data is set) and it makes sense to read
> > * application data at this point (session renegotiation not yet
> > * started), we will indulge it.
> > */
> > if (s->s3->in_read_app_data &&
> > (s->s3->total_renegotiations != 0) &&
> > (((s->state & SSL_ST_CONNECT) &&
> > (s->state >= SSL3_ST_CW_CLNT_HELLO_A) &&
> > (s->state <= SSL3_ST_CR_SRVR_HELLO_A)
> > ) || ((s->state & SSL_ST_ACCEPT) &&
> > (s->state <= SSL3_ST_SW_HELLO_REQ_A) &&
> > (s->state >= SSL3_ST_SR_CLNT_HELLO_A)
> > )
> > )) {
> > s->s3->in_read_app_data = 2;
> > return (-1);
> > } else {
> > al = SSL_AD_UNEXPECTED_MESSAGE;
> > SSLerr(SSL_F_SSL3_READ_BYTES, SSL_R_UNEXPECTED_RECORD);
> > goto f_err;
> > }
>
> So that quite clearly throws an error, *unless* the original application
> call was SSL_read(). As you also concluded, OpenSSL doesn't like it when
> SSL_write() is called when it's in renegotiation. I think that's OpenSSL's
> fault and it should "indulge" the application data message even in
> SSL_write().
That'd be nice, but I think there were multiple reports to them about
this and there wasn't much of a response. Maybe it's time to try again.
> In theory, I guess we should do similar hacks in the server, and always call
> SSL_read() before SSL_write(), but it seems to work without it. Or maybe
> not; OpenSSL server and client code is not symmetric, so perhaps it works in
> server mode without that.
I think we should do it on the server side - got some server side errors
when I cranked up the pg_receivexlog's status interval and set the wal
sender timeout very low.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-02-12 00:03:40 | Re: libpq's multi-threaded SSL callback handling is busted |
Previous Message | Andres Freund | 2015-02-11 23:33:39 | Re: SSL renegotiation and other related woes |