Re: Fix for OpenSSL error queue bug

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Zuelke <dz(at)heroku(dot)com>
Subject: Re: Fix for OpenSSL error queue bug
Date: 2016-03-11 02:38:28
Message-ID: CAM3SWZRDFMoJpRy_WdnYDyAQ9+adkqEZQ5bdRyugJbFgEHQ7+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looked at your proposed patch. Will respond to your original mail on the matter.

On Thu, Mar 3, 2016 at 4:15 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> I think clearing the error after a call is not necessary. The API
> clearly requires that you should clear the error queue before a call, so
> clearing it afterwards does not accomplish anything, except maybe make
> broken code work sometimes, for a while.

Uh, if it's so clear, then why haven't we been doing it all along? The
API doesn't require you to take *any* specific practical measure (for
example, the specific practical measure of resetting the queue before
calling an I/O function). It simply says "this exact thing cannot be
allowed to happen; the consequences are undefined", with nothing in
the way of guidance on what that means in the real world. It's a
shockingly bad API, but that's the reality.

Part of the problem is that various scripting language OpenSSL
wrappers are only very thin wrappers. They effectively pass the buck
on to PHP and Ruby devs. If we cannot get it right, what chance have
they? I've personally experienced a bit uptick in complaints about
this recently. I think there are 3 separate groups within Heroku that
regularly ask me how this patch is doing.

> Also, there is nothing that
> says that an error produces exactly one entry in the error queue; it
> could be multiple. Or that errors couldn't arise at random times
> between the reset and whatever happens next.

I think that it's kind of implied, since calling ERR_get_error() pops
the stack. But even if that isn't so, it might be worth preventing bad
things from happening to client applications only sometimes.

> I think this is analogous to clearing errno before a C library call.
> You could clear it afterwards as well, to be nice to the next guy, but
> the next guy should really take care of that themselves, and we can't
> rely on what happens in between anyway.

It sounds like you're saying "well, we cannot be expected to bend over
backwards to make broken code work". But that broken code includes
every single version of libpq + OpenSSL currently distributed. Seems
like a very high standard. I'm not saying that that means we
definitely should clear the error queue reliably ourselves, but
doesn't it give you pause? Heikki seemed to think that clearing our
own queue was important when he looked at this a year ago:

http://www.postgresql.org/message-id/54EDD30D.5050107@vmware.com

Again, not conclusive, but I would like to hear a rationale for why
you think it's okay to not consistently clear our own queue for the
benefit of others. Is this informed by a concern about some specific
downside to taking that extra precaution?

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-11 02:54:10 Re: Using quicksort for every external sort run
Previous Message Robert Haas 2016-03-11 02:36:14 Re: [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.