Re: Fix for OpenSSL error queue bug

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Zuelke <dz(at)heroku(dot)com>
Subject: Re: Fix for OpenSSL error queue bug
Date: 2016-03-14 22:50:39
Message-ID: CAM3SWZRsRqhymAP7vyq7Ai9+ATLXO4R2utn-cwRi=B61XFYR3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 14, 2016 at 3:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Agreed, we need to deal with this one way or the other. My proposal
> is:
>
> 1. In HEAD, do it as Peter E. suggests, ie clear error queue before calls.
>
> 2. In back branches, clear error queue before *and* after calls. This
> will waste a few nanoseconds but will avoid any risk of breaking
> existing third-party code.
>
> I think it's reasonable to expect extensions to update to the newer
> behavior in a major release, but we're taking risks if we expect
> that to happen in minor releases.

I am concerned that users will never be able to get this right, since
I think it requires every Ruby or PHP app using some thin OpenSSL
wrapper to clear the per-queue thread. It's a big mess, but it's our
mess to some degree.

I wonder if it would be just as good if we ensured that
ERR_get_error() was definitely called in any circumstance where it
looked like we might have an error: ERR_get_error() would be
*reliably* called, as in my patch, but unlike my patch only when
SSL_get_error() indicated a problem (not all the time).

Heikki believed that clearing the error queue by calling
ERR_clear_error() before calling an I/O function like SSL_read() was
necessary, as we all do; no controversy there. But Heikki also
believed, even prior to hearing about this bug, that it was important
and necessary for ERR_get_error() to be reached when there might be an
error added to the queue following a Postgres/libpq call to an I/O
function like SSL_read() followed by SSL_get_error() indicating
trouble. He thought, as I do, that it would be a good idea to not rely
on that happening from a distance (i.e. not relying on reaching
SSLerrmessage()). Peter E. seems to believe that there is absolutely
no reason to rely on ERR_get_error() getting called at all, and that
the existing SSLerrmessage() only exists for the purposes of producing
a human-readable error message.

Anyway, thinking about it some more, perhaps the best solution is to
do the ERR_get_error() call iff SSL_get_error() seems unhappy, perhaps
even placing the two into a utility function. That's probably almost
the same as the existing behavior, as far as clearing up the queue
after we may have added to it goes. I don't know if that's any less
safe then my patch's pessimistic approach. It seems like it might be
just as safe. Under this compromise, I think we'd probably clear the
error queue after SSL_get_error() returned a value that is not
SSL_ERROR_NONE, though (including SSL_ERROR_WANT_READ, etc). What do
you think about that?

> In any case, we need a patch that touches the backend-side code as well.

I agree that the backend-side code should be covered. I quickly
changed my mind about that, and am happy to produce a revision along
those lines.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-14 23:05:13 Re: Fix for OpenSSL error queue bug
Previous Message Tomas Vondra 2016-03-14 22:39:36 Re: Parallel Aggregate