Re: Support for NSS as a libpq TLS backend

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Jacob Champion <pchampion(at)vmware(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "andrew(dot)dunstan(at)2ndquadrant(dot)com" <andrew(dot)dunstan(at)2ndquadrant(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>
Subject: Re: Support for NSS as a libpq TLS backend
Date: 2021-06-16 13:31:42
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 16 Jun 2021, at 01:50, Jacob Champion <pchampion(at)vmware(dot)com> wrote:

> I've been tracking down reference leaks in the client. These open
> references prevent NSS from shutting down cleanly, which then makes it
> impossible to open a new context in the future. This probably affects
> other libpq clients more than it affects psql.

Ah, nice catch, that's indeed a bug in the frontend implementation. The
problem is that the NSS trustdomain cache *must* be empty before shutting down
the context, else this very issue happens. Note this in be_tls_destroy():

* It reads a bit odd to clear a session cache when we are destroying the
* context altogether, but if the session cache isn't cleared before
* shutting down the context it will fail with SEC_ERROR_BUSY.

Calling SSL_ClearSessionCache() in pgtls_close() fixes the error.

There is another resource leak left (visible in one test after the above is
added), the SECMOD module needs to be unloaded in case it's been loaded.
Implementing that with SECMOD_UnloadUserModule trips a segfault in NSS which I
have yet to figure out (when acquiring a lock with NSSRWLock_LockRead).

> The first step to fixing that is not ignoring failures during NSS
> shutdown, so I've tried a patch to pgtls_close() that pushes any
> failures through the pqInternalNotice(). That seems to be working well.

I'm keeping these in during hacking, with a comment that they need to be
revisited during review since they are mainly useful for debugging.

> The tests were still mostly green, so I taught connect_ok() to fail if
> any stderr showed up, and that exposed quite a few failures.

With your patches I'm seeing a couple of these:

SSL error: The one-time function was previously called and failed. Its error code is no longer available

This is an error from NSPR, but it's not clear to me which PR_CallOnce call
it's coming from. It seems to be hitting in the SAN and CRL tests, so it
smells of some form of caching implemented with NSPR API's to me but thats a
mere hunch.

> I am currently stuck on one last failing test. This leak seems to only
> show up when using TLSv1.2 or below.

AFAICT the session cache is avoided for TLSv1.3 due to 1.3 not supporting

Daniel Gustafsson

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-06-16 13:33:38 Re: Unresolved repliaction hang and stop problem.
Previous Message Heikki Linnakangas 2021-06-16 13:30:45 Split xlog.c