Quick Links

Re: Support for NSS as a libpq TLS backend

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Jacob Champion <pchampion(at)vmware(dot)com>
Cc:	"pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "andrew(dot)dunstan(at)2ndquadrant(dot)com" <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>
Subject:	Re: Support for NSS as a libpq TLS backend
Date:	2021-06-16 19:52:47
Message-ID:	FBBFDB5D-644C-45E7-B8F0-016068464B0C@yesql.se
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On 16 Jun 2021, at 18:15, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
>
> On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
>>> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
>>> I've been tracking down reference leaks in the client. These open
>>> references prevent NSS from shutting down cleanly, which then makes it
>>> impossible to open a new context in the future. This probably affects
>>> other libpq clients more than it affects psql.
>>
>> Ah, nice catch, that's indeed a bug in the frontend implementation. The
>> problem is that the NSS trustdomain cache *must* be empty before shutting down
>> the context, else this very issue happens. Note this in be_tls_destroy():
>>
>> /*
>> * It reads a bit odd to clear a session cache when we are destroying the
>> * context altogether, but if the session cache isn't cleared before
>> * shutting down the context it will fail with SEC_ERROR_BUSY.
>> */
>> SSL_ClearSessionCache();
>>
>> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.
>
> That's unfortunate. The session cache is global, right? So I'm guessing
> we'll need to refcount and lock that call, to avoid cleaning up out
> from under a thread that's actively using the the cache?

I'm not sure, the documentation doesn't give any answers and implementations of
libnss tend to just clear the cache without consideration. In libcurl we do
just that, and haven't had any complaints - which doesn't mean it's correct but
it's a datapoint.

>> There is another resource leak left (visible in one test after the above is
>> added), the SECMOD module needs to be unloaded in case it's been loaded.
>> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
>> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
>>
>> [...]
>>
>> With your patches I'm seeing a couple of these:
>>
>> SSL error: The one-time function was previously called and failed. Its error code is no longer available
>
> Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
> moment) fixes all of the SSL tests for me, and I don't see either the
> SECMOD leak or the "one-time function" error that you've mentioned.

Reading the code I don't think a loaded user module is considered a resource
that must've been released prior to closing the context. I will dig for what
showed up in my tests, but I don't think it was caused by this.

> What version of NSS are you running? I'm on 3.63.

Right now I'm using what Debian 10 is packaging which is 3.42. Admittedly not
hot off the press but I've been trying to develop off a packaged version which
we might see users wanting to deploy against should this get shipped.

--
Daniel Gustafsson https://vmware.com/

In response to

Re: Support for NSS as a libpq TLS backend at 2021-06-16 16:15:56 from Jacob Champion

Responses

Re: Support for NSS as a libpq TLS backend at 2021-06-23 13:48:59 from Daniel Gustafsson

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matthias van de Meent	2021-06-16 20:08:39	Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message	Tom Lane	2021-06-16 19:47:56	Re: Improving the isolationtester: fewer failures, less delay