Re: Fix for OpenSSL error queue bug

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Zuelke <dz(at)heroku(dot)com>
Subject: Re: Fix for OpenSSL error queue bug
Date: 2016-03-11 03:22:18
Message-ID: 56E239EA.4090204@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/10/16 9:38 PM, Peter Geoghegan wrote:
> Looked at your proposed patch. Will respond to your original mail on the matter.
>
> On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> I think clearing the error after a call is not necessary. The API
>> clearly requires that you should clear the error queue before a call, so
>> clearing it afterwards does not accomplish anything, except maybe make
>> broken code work sometimes, for a while.
>
> Uh, if it's so clear, then why haven't we been doing it all along?

The issue only happens if two interleaving trains of execution, one of
which is libpq, use OpenSSL. Not many applications do that. And you
also need to provoke the errors in a certain order. And even then, in
some cases you might just see a false positive error, rather than a
crash. So it's an edge case.

> Part of the problem is that various scripting language OpenSSL
> wrappers are only very thin wrappers. They effectively pass the buck
> on to PHP and Ruby devs. If we cannot get it right, what chance have
> they? I've personally experienced a bit uptick in complaints about
> this recently. I think there are 3 separate groups within Heroku that
> regularly ask me how this patch is doing.

I think they have been getting away with it for so long for the same
reasons.

Arguably, if everyone followed "my" approach, this should be very easy
to fix everywhere. Instead, reading through the PHP bug report, they
are coming up with a fairly complex solution for clearing the error
queue afterwards so as to not leave "landmines" for the next guy. But
their code will (AFAICT) still be wrong because they are not clearing
the error *before* the API calls where it is required per documentation.
So "everyone" (sample of 2) is scrambling to clean up for the next guy
instead of doing the straightforward fix of following the API
documentation and cleaning up before their own calls.

I also see the clean-up-afterwards approach in the Python ssl module. I
fear there is de facto a second API specification that requires you to
clean up errors after yourself and gives an implicit guarantee that the
error queue is empty whenever you want to make any SSL calls. I don't
think this actually works in all cases, but maybe if everyone else is
convinced of that (in plain violation of the published OpenSSL
documentation, AFAICT) we need to get on board with that for
interoperability.

>> Also, there is nothing that
>> says that an error produces exactly one entry in the error queue; it
>> could be multiple. Or that errors couldn't arise at random times
>> between the reset and whatever happens next.
>
> I think that it's kind of implied, since calling ERR_get_error() pops
> the stack. But even if that isn't so, it might be worth preventing bad
> things from happening to client applications only sometimes.

The lore on the internet suggests that multiple errors could definitely
happen. So popping one error afterwards isn't going to fix it, it just
moves the edge case around. At least what we should do is clear the
entire queue afterwards instead of just the first error.

> doesn't it give you pause? Heikki seemed to think that clearing our
> own queue was important when he looked at this a year ago:
>
> http://www.postgresql.org/message-id/54EDD30D.5050107@vmware.com

I think that message suggests that we should clear the queue before each
call, not after.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-03-11 03:22:54 Re: Adjusting the API of pull_var_clause()
Previous Message Peter Geoghegan 2016-03-11 02:54:10 Re: Using quicksort for every external sort run