Re: Failing SSL connection due to weird interaction with openssl

From: Lars Kanis <lars(at)greiz-reinsdorf(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Failing SSL connection due to weird interaction with openssl
Date: 2012-11-11 14:55:11
Message-ID: 509FBC4F.7070907@greiz-reinsdorf.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am 06.11.2012 21:40, schrieb Robert Haas:
> On Tue, Oct 23, 2012 at 4:09 AM, Lars Kanis <lars(at)greiz-reinsdorf(dot)de> wrote:
>> While investigating a ruby-pg issue [1], we noticed that a libpq SSL
>> connection can fail, if the running application uses OpenSSL for other work,
>> too. Root cause is the thread local error queue of OpenSSL, that is used to
>> transmit textual error messages to the application after a failed crypto
>> operation. In case that the application leaves errors on the queue, the
>> communication to the PostgreSQL server can fail with a message left from the
>> previous failed OpenSSL operation, in particular when using non-blocking
>> operations on the socket. This issue with openssl is quite old now - see
>> [3].
>>
>> For [1] it turned out that the issue is subdivided into these three parts:
>> 1. the ruby-openssl binding does not clear the thread local error queue of
>> OpenSSL after a certificate verify
>> 2. OpenSSL makes use of a shared error queue for different crypto contexts.
>> 3. libpq does not ensure a cleared error queue when doing SSL_* calls
>>
>> To 1: Remaining messages on the error queue can generally lead to failing
>> operations, later on. I'd talk to the ruby-openssl developers, to discuss
>> how we can avoid any remaining messages on the queue.
>>
>> To 2: SSL_get_error() inspects the shared error queue under some conditions.
>> It's maybe poor API design, but it's documented behaviour [2]. So we
>> certainly have to get along with it.
>>
>> To 3: To make libpq independent to a previous error state, the error queue
>> might be cleared with a call to ERR_clear_error() prior
>> SSL_connect/read/write as in the attached trivial patch. This would make
>> libpq robust against other uses of openssl within the application.
>>
>> What do you think about clearing the OpenSSL error queue in libpq in that
>> way?
> Shouldn't it be the job of whatever code is consuming the error to
> clear the error queue afterwards?
>
Yes, of course. I already filed a bug for ruby-openssl, some weeks ago [1].

But IMHO libpq should be changed too, for the following reasons:

1. The behavior of libpq isn't consistent, since blocking calls are
already agnostic to remaining errors in the openssl queue, but
non-blocking are not. This is a openssl quirk, that is exposed to the
libpq-API, this way.

2. libpq throws wrong errors. The error of libpq isn't "Remaining errors
in openssl error queue. libpq requires a clear error queue in order to
work correctly.", but instead it throws arbitrary foreign errors that
could relate to or may not relate to the communication of libpq. The
documentation for SSL_get_error(3) is pretty unambiguous about the need
to clear the error queue first.

3. The sensitivity of libpq to the error queue can lead to bugs that
are hard to track down, like this one [2]. This is because a libpq error
leads the developer to look for a bug related to the database
connection, although the issue is in a very different part of the code.

Regards,
Lars

[1] http://bugs.ruby-lang.org/issues/7215
[2]
https://bitbucket.org/ged/ruby-pg/issue/142/async_exec-over-ssl-connection-can-fail-on

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew Gerber 2012-11-11 15:10:31 Re: Unresolved error 0xC0000409 on Windows Server
Previous Message Noah Misch 2012-11-11 14:51:59 Re: [PATCH] Patch to compute Max LSN of Data Pages