Re: Rare SSL failures on eelpout

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rare SSL failures on eelpout
Date: 2019-03-04 21:08:02
Message-ID: 26030.1551733682@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
>> That suggests that we could perhaps handle ECONNRESET both at startup
>> packet send time (for certificate rejection, eelpout's case) and at
>> initial query send (for idle timeout, bug #15598's case) by attempting
>> to read. Does that make sense?

> Hmm ... it definitely makes sense that we shouldn't assume that a *write*
> failure means there is nothing left to *read*.

After staring at the code for awhile, I am thinking that there may be
a bug of that ilk, but if so it's down inside OpenSSL. Perhaps it's
specific to the OpenSSL version you're using on eelpout? There is
not anything we could do differently in libpq, AFAICS, because it's
OpenSSL's responsibility to read any data that might be available.

I also looked into the idea that we're doing something wrong on the
server side, allowing the final error message to not get flushed out.
A plausible theory there is that SSL_shutdown is returning a WANT_READ
or WANT_WRITE error and we should retry it ... but that doesn't square
with your observation upthread that it's returning SSL_ERROR_SSL.

It's all very confusing, but I think there's a nontrivial chance
that this is an OpenSSL bug, especially since we haven't been able
to replicate it elsewhere.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-04 21:13:41 Re: Allowing extensions to supply operator-/function-specific info
Previous Message Alvaro Herrera 2019-03-04 20:46:07 Re: monitoring CREATE INDEX [CONCURRENTLY]