Re: Rare SSL failures on eelpout

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rare SSL failures on eelpout
Date: 2019-03-05 16:23:23
Message-ID: CA+hUKGJafyTgpsYBgQGt1EX0O8UnL4VGHSc7J0KZyMH4_jPGBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 6, 2019 at 3:33 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Disappointingly, that turned out to be just because 10 and earlier
> > didn't care what the error message said.
>
> That is, you can reproduce the failure on old branches? That lets
> out a half-theory I'd had, which was that Andres' changes to make
> the backend always run its socket in nonblock mode had had something
> to do with it. (Those changes do represent a plausible reason why
> SSL_shutdown might be returning WANT_READ/WANT_WRITE; but I'm not
> in a hurry to add such code without evidence that it actually
> happens and something useful would change if we retry.)

Yes, on REL_10_STABLE:

$ for i in `seq 1 1000 ` ; do
psql "host=localhost port=56024 dbname=certdb user=postgres
sslcert=ssl/client-revoked.crt sslkey=ssl/client-revoked.key"
done
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
...
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
could not send startup packet: Connection reset by peer
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked
psql: SSL error: sslv3 alert certificate revoked

Zooming in with strace:

sendto(3, "\27\3\3\2\356\r\214\352(at)\21\320\202\236}\376\367\262\227\177\255\212\204`q\254\108\326\201+c)"...,
1115, MSG_NOSIGNAL, NULL, 0) = 1115
ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3,
revents=POLLOUT|POLLERR|POLLHUP}])
sendto(3, "\27\3\3\0cW_\210\337Q\227\360\216k\221\346\372pw\27\325P\203\357\245km\304Rx\355\200"...,
104, MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer)

You can see that poll() already knew the other end had closed the
socket. Since this is clearly timing... let's see, yeah, I can make
it fail every time by adding sleep(1) before the comment "Send the
startup packet.". I assume that'll work on any Linux machine?

To set this test up, I ran a server with the following config:

ssl=on
ssl_ca_file='root+client_ca.crt'
ssl_cert_file='server-cn-only.crt'
ssl_key_file='server-cn-only.key'
ssl_crl_file='root+client.crl'

I copied those files out of src/test/ssl/ssl/. Then I ran the psql
command shown earlier. I think I had to chmod 600 the keys.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-03-05 16:33:54 Re: Refactoring the checkpointer's fsync request queue
Previous Message Justin Pryzby 2019-03-05 16:21:45 Re: Question about pg_upgrade from 9.2 to X.X