Re: Support for NSS as a libpq TLS backend

From: Jacob Champion <pchampion(at)vmware(dot)com>
To: "daniel(at)yesql(dot)se" <daniel(at)yesql(dot)se>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "andrew(dot)dunstan(at)2ndquadrant(dot)com" <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>
Subject: Re: Support for NSS as a libpq TLS backend
Date: 2021-06-16 16:15:56
Message-ID: 14b33300304507db6f4e4d8a77da105a8d18ea1a.camel@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2021-06-16 at 15:31 +0200, Daniel Gustafsson wrote:
> > On 16 Jun 2021, at 01:50, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
> > I've been tracking down reference leaks in the client. These open
> > references prevent NSS from shutting down cleanly, which then makes it
> > impossible to open a new context in the future. This probably affects
> > other libpq clients more than it affects psql.
>
> Ah, nice catch, that's indeed a bug in the frontend implementation. The
> problem is that the NSS trustdomain cache *must* be empty before shutting down
> the context, else this very issue happens. Note this in be_tls_destroy():
>
> /*
> * It reads a bit odd to clear a session cache when we are destroying the
> * context altogether, but if the session cache isn't cleared before
> * shutting down the context it will fail with SEC_ERROR_BUSY.
> */
> SSL_ClearSessionCache();
>
> Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.

That's unfortunate. The session cache is global, right? So I'm guessing
we'll need to refcount and lock that call, to avoid cleaning up out
from under a thread that's actively using the the cache?

> There is another resource leak left (visible in one test after the above is
> added), the SECMOD module needs to be unloaded in case it's been loaded.
> Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
> have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).
>
> [...]
>
> With your patches I'm seeing a couple of these:
>
> SSL error: The one-time function was previously called and failed. Its error code is no longer available

Hmm. Adding SSL_ClearSessionCache() (without thread-safety at the
moment) fixes all of the SSL tests for me, and I don't see either the
SECMOD leak or the "one-time function" error that you've mentioned.
What version of NSS are you running? I'm on 3.63.

I've attached my current patchset (based on v37) for comparison.

> > I am currently stuck on one last failing test. This leak seems to only
> > show up when using TLSv1.2 or below.
>
> AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting
> renegotiation.

Nice, at least that mystery is solved. :D

Thanks,
--Jacob

Attachment Content-Type Size
0001-nss-don-t-ignore-failures-during-context-shutdown.patch.txt text/plain 1.3 KB
0002-test-check-for-empty-stderr-during-connect_ok.patch.txt text/plain 3.6 KB
0003-nss-clean-up-leaked-resources.patch.txt text/plain 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-06-16 16:23:28 PoC: Using Count-Min Sketch for join cardinality estimation
Previous Message Stephen Frost 2021-06-16 16:11:45 Re: snapshot too old issues, first around wraparound and then more.