Re: Rare SSL failures on eelpout

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rare SSL failures on eelpout
Date: 2019-01-22 22:22:48
Message-ID: CAEepm=0sHUVZHfz3Bcxaqj3YQwXAX2za_AKtEhZc2gxAomdEDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 23, 2019 at 4:07 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > Hmm. Why is psql doing two sendto() calls without reading a response
> > in between, when it's possible for the server to exit after the first,
> > anyway? Seems like a protocol violation somewhere?
>
> Keep in mind this is all down inside the SSL handshake, so if any
> protocol is being violated, it's theirs not ours.

The sendto() of 1115 bytes is SSL_connect()'s last syscall, just
before it returns 1 to indicate success (even though it wasn't
successful?), without waiting for a further response. The sendto() of
107 bytes is our start-up packet, which either succeeds and is
followed by reading a "certificate revoked" message from the server,
or fails with ECONNRESET if the socket has already been shut down at
the server end due to the racing exit.

It seems very strange to me that the error report is deferred until we
send our start-up packet. It seems like a response that belongs to
the connection attempt, not our later data sending. Bug in OpenSSL?
Unintended consequence of our switch to blocking IO at that point?

I tried to find out how this looked on 1.0.2, but it looks like Debian
has just removed the older version from the buster distro and I'm out
of time to hunt this on other OSes today.

> The whole thing reminds me of the recent bug #15598:
>
> https://www.postgresql.org/message-id/87k1iy44fd.fsf%40news-spur.riddles.org.uk

Yeah, if errors get moved to later exchanges but the server might exit
and close its end of the socket before we can manage to initiate a
later exchange, it starts to look just like that.

A less interesting bug is the appearance of 3 nonsensical "Success"
(glibc) or "No error: 0" (FreeBSD) error messages in the server logs
on systems running OpenSSL 1.1.1, much like this, which I guess might
mean EOF:

https://www.postgresql.org/message-id/CAEepm=3cc5wYv=X4Nzy7VOUkdHBiJs9bpLzqtqJWxdDUp5DiPQ@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message maayan mordehai 2019-01-22 23:03:02 postgres on a non-journaling filesystem
Previous Message Isaac Morland 2019-01-22 21:47:10 Re: Strange query behaviour