Re: Support for NSS as a libpq TLS backend

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Support for NSS as a libpq TLS backend
Date: 2020-10-27 20:07:01
Message-ID: 907FA7D1-9BD9-464B-A6EC-DEEB53438D36@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 20 Oct 2020, at 21:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,

Thanks for your review, much appreciated!

> On 2020-10-20 14:24:24 +0200, Daniel Gustafsson wrote:
>> From 0cb0e6a0ce9adb18bc9d212bd03e4e09fa452972 Mon Sep 17 00:00:00 2001
>> From: Daniel Gustafsson <daniel(at)yesql(dot)se>
>> Date: Thu, 8 Oct 2020 18:44:28 +0200
>> Subject: [PATCH] Support for NSS as a TLS backend v12
>> ---
>> configure | 223 +++-
>> configure.ac | 39 +-
>> contrib/Makefile | 2 +-
>> contrib/pgcrypto/Makefile | 5 +
>> contrib/pgcrypto/nss.c | 773 +++++++++++
>> contrib/pgcrypto/openssl.c | 2 +-
>> contrib/pgcrypto/px.c | 1 +
>> contrib/pgcrypto/px.h | 1 +
>
> Personally I'd like to see this patch broken up a bit - it's quite
> large. Several of the changes could easily be committed separately, no?

Not sure how much of this makes sense committed separately (unless separately
means in quick succession), but it could certainly be broken up for the sake of
making review easier. I will take a stab at that, but in a follow-up email as
I would like the split to be a version just doing the split and not also
introducing/fixing things.

>> if test "$with_openssl" = yes ; then
>> + if test x"$with_nss" = x"yes" ; then
>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> + fi
>
> Based on a quick look there's no similar error check for the msvc
> build. Should there be?

Thats a good question. When embarking on this is seemed quite natural to me
that it should be, but now I'm not so sure. Maybe there should be a
--with-openssl-preferred like how we handle readline/libedit or just allow
multiple and let the last one win? Do you have any input on what would make
sense?

The only thing I think makes no sense is to allow multiple ones at the same
time given the current autoconf switches, even if it would just be to pick say
pg_strong_random from one and libpq TLS from another.

>> +if test "$with_nss" = yes ; then
>> + if test x"$with_openssl" = x"yes" ; then
>> + AC_MSG_ERROR([multiple SSL backends cannot be enabled simultaneously"])
>> + fi
>
> Isn't this a repetition of the earlier check?

It is, and it we want to keep such a check it should be broken out into a
separate step performed before all library specific checks IMO.

>> + CLEANLDFLAGS="$LDFLAGS"
>> + # TODO: document this set of LDFLAGS
>> + LDFLAGS="-lssl3 -lsmime3 -lnss3 -lplds4 -lplc4 -lnspr4 $LDFLAGS"
>
> Shouldn't this use nss-config or such?

Indeed it should, where available. I've added rudimentary support for that
without a fallback as of now.

>> +if test "$with_nss" = yes ; then
>> + AC_CHECK_HEADER(ssl.h, [], [AC_MSG_ERROR([header file <ssl.h> is required for NSS])])
>> + AC_CHECK_HEADER(nss.h, [], [AC_MSG_ERROR([header file <nss.h> is required for NSS])])
>> +fi
>
> Hm. For me, on debian, these headers are not directly in the default
> include search path, but would be as nss/ssl.h. I don't see you adding
> nss/ to CFLAGS anywhere? How does this work currently?

I had Stockholm-syndromed myself into passing --with-includes and hadn't really
realized. Sometimes the obvious is too obvious in a 4000+ LOC patch.

> I think it'd also be better if we could include these files as nss/ssl.h
> etc - ssl.h is a name way too likely to conflict imo.

I've changed this to be nss/ssl.h and nspr/nspr.h etc, but the include path
will still need the direct path to the headers (from autoconf) since nss.h
includes NSPR headers as #include <nspr.h> and so on.

>> +++ b/src/backend/libpq/be-secure-nss.c
>> @@ -0,0 +1,1158 @@
>> +/*
>> + * BITS_PER_BYTE is also defined in the NSPR header files, so we need to undef
>> + * our version to avoid compiler warnings on redefinition.
>> + */
>> +#define pg_BITS_PER_BYTE BITS_PER_BYTE
>> +#undef BITS_PER_BYTE
>
> Most compilers/preprocessors don't warn about redefinitions when they
> would result in the same value (IIRC we have some cases of redefinitions
> in tree even). Does nspr's differ?

GCC 8.3 in my Debian installation throws the below warning:

In file included from /usr/include/nspr/prtypes.h:26,
from /usr/include/nspr/pratom.h:14,
from /usr/include/nspr/nspr.h:9,
from be-secure-nss.c:45:
/usr/include/nspr/prcpucfg.h:1143: warning: "BITS_PER_BYTE" redefined
#define BITS_PER_BYTE PR_BITS_PER_BYTE

In file included from ../../../src/include/c.h:55,
from ../../../src/include/postgres.h:46,
from be-secure-nss.c:16:
../../../src/include/pg_config_manual.h:115: note: this is the location of the previous definition
#define BITS_PER_BYTE 8

PR_BITS_PER_BYTE is defined per platform in pr/include/md/_<platform>.cfg and
is as expected 8. I assume it's that indirection which cause the warning?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR. The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + * This section typedefs the old 'native' types to the new PR<type>s.
>> + * These definitions are scheduled to be eliminated at the earliest
>> + * possible time. The NSPR API is implemented and documented using
>> + * the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>> +#define PROTYPES_H
>> +#include <nspr.h>
>> +#undef PROTYPES_H
>
> Yuck :(.

Thats not an understatement. Taking another dive into the NSPR code I did
however find a proper way to deal with this. Defining NO_NSPR_10_SUPPORT stops
NSPR from using the files in obsolete/. So fixed, yay!

>> +int
>> +be_tls_init(bool isServerStart)
>> +{
>> + SECStatus status;
>> + SSLVersionRange supported_sslver;
>> +
>> + /*
>> + * Set up the connection cache for multi-processing application behavior.
>
> Hm. Do we necessarily want that? Session resumption is not exactly
> unproblematic... Or does this do something else?

From my reading of the docs, and experience with the code, a server application
must set up a connection cache in order to accept connections. Not entirely
sure, and the docs aren't terribly clear for non SSLv2/v3 environments (it
seems to only cache for SSLv2/3 and not TLSv+) but it seems like it may have
other uses internally. I will hunt down some more information on the NSS
mailing list.

>> + * If we are in ServerStart then we initialize the cache. If the server is
>> + * already started, we inherit the cache such that it can be used for
>> + * connections. Calling SSL_ConfigMPServerSIDCache sets an environment
>> + * variable which contains enough information for the forked child to know
>> + * how to access it. Passing NULL to SSL_InheritMPServerSIDCache will
>> + * make the forked child look it up by the default name SSL_INHERITANCE,
>> + * if env vars aren't inherited then the contents of the variable can be
>> + * passed instead.
>> + */
>
> Does this stuff work on windows

According to the documentation it does, and Andrew had this working on Windows
in an earlier version of the patch. I need to get a proper Windows env for
testing/dev up and running as mine has bitrotted to nothingness.

> / EXEC_BACKEND?

That's a good point, maybe we need to do a SSL_ConfigServerSessionIDCache
rather than the MP version for EXEC_BACKEND? Not sure.

>> + * The below parameters are what the implicit initialization would've done
>> + * for us, and should work even for older versions where it might not be
>> + * done automatically. The last parameter, maxPTDs, is set to various
>> + * values in other codebases, but has been unused since NSPR 2.1 which was
>> + * released sometime in 1998.
>> + */
>> + PR_Init(PR_USER_THREAD, PR_PRIORITY_NORMAL, 0 /* maxPTDs */ );
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_Init
> says that currently all parameters are ignored?

Right, my comment didn't reflect that they're all dead these days, only that
one of them has been unused since RUN DMC topped the charts with "It's like
that". Comment updated.

>> + /*
>> + * Import the already opened socket as we don't want to use NSPR functions
>> + * for opening the network socket due to how the PostgreSQL protocol works
>> + * with TLS connections. This function is not part of the NSPR public API,
>> + * see the comment at the top of the file for the rationale of still using
>> + * it.
>> + */
>> + pr_fd = PR_ImportTCPSocket(port->sock);
>> + if (!pr_fd)
>> + ereport(ERROR,
>> + (errmsg("unable to connect to socket")));
>
> I don't see the comment you're referring to?

It's referring to the comment discussing PR_ImportTCPSocket being a private API
call, yet still used by everyone (which is also discussed later in this review).

>> + /*
>> + * Most of the documentation available, and implementations of, NSS/NSPR
>> + * use the PR_NewTCPSocket() function here, which has the drawback that it
>> + * can only create IPv4 sockets. Instead use PR_OpenTCPSocket() which
>> + * copes with IPv6 as well.
>> + */
>> + model = PR_OpenTCPSocket(port->laddr.addr.ss_family);
>> + if (!model)
>> + ereport(ERROR,
>> + (errmsg("unable to open socket")));
>> +
>> + /*
>> + * Convert the NSPR socket to an SSL socket. Ensuring the success of this
>> + * operation is critical as NSS SSL_* functions may return SECSuccess on
>> + * the socket even though SSL hasn't been enabled, which introduce a risk
>> + * of silent downgrades.
>> + */
>> + model = SSL_ImportFD(NULL, model);
>> + if (!model)
>> + ereport(ERROR,
>> + (errmsg("unable to enable TLS on socket")));
>
> It's confusing that these functions do not actually reference the socket
> via some handle :(. What does opening a socket do here?

This specific call converts the socket from a plain NSPR socket to an SSL/TLS
capable socket which NSS will work with. This is a required step for
"activating" NSS on the socket.

>> + /*
>> + * Configure the allowed cipher. If there are no user preferred suites,
>
> *ciphers?

Yes, fixed.

>> +
>> + port->pr_fd = SSL_ImportFD(model, pr_fd);
>> + if (!port->pr_fd)
>> + ereport(ERROR,
>> + (errmsg("unable to initialize")));
>> +
>> + PR_Close(model);
>
> A comment explaining why we first import a NULL into the model, and then
> release the model, and import the real fd would be good.

I've added a small comment to explain how the model is a configuration template
for the actual socket. This part of NSS/NSPR is a bit overcomplicated for how
we have connections, it's more geared towards having many open sockets in the
same process.

>> +ssize_t
>> +be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
>> +{
>> + ssize_t n_read;
>> + PRErrorCode err;
>> +
>> + n_read = PR_Read(port->pr_fd, ptr, len);
>> +
>> + if (n_read < 0)
>> + {
>> + err = PR_GetError();
>
> Yay, more thread global state :(.

Sorry about that.

>> + /* XXX: This logic seems potentially bogus? */
>> + if (err == PR_WOULD_BLOCK_ERROR)
>> + *waitfor = WL_SOCKET_READABLE;
>> + else
>> + *waitfor = WL_SOCKET_WRITEABLE;
>
> Don't we need to handle failed connections somewhere here? secure_read()
> won't know about PR_GetError() etc? How would SSL errors be signalled
> upwards here?
>
> Also, as you XXX, it's not clear to me that your mapping would always
> result in waiting for the right event? A tls write could e.g. very well
> require receiving data etc?

Fixed, but there might be more to be done here.

>> + /*
>> + * At least one byte with password content was returned, and NSS requires
>> + * that we return it allocated in NSS controlled memory. If we fail to
>> + * allocate then abort without passing back NULL and bubble up the error
>> + * on the PG side.
>> + */
>> + password = (char *) PR_Malloc(len + 1);
>> + if (!password)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OUT_OF_MEMORY),
>> + errmsg("out of memory")));
>>
>> + strlcpy(password, buf, sizeof(password));
>> + explicit_bzero(buf, sizeof(buf));
>> +
>
> In case of error you're not bzero'ing out the password!

Fixed.

> Separately, I wonder if we should introduce a function for throwing OOM
> errors - which then e.g. could print the memory context stats in those
> places too...

+1. I'd be happy to review such a patch.

>> +static SECStatus
>> +pg_cert_auth_handler(void *arg, PRFileDesc * fd, PRBool checksig, PRBool isServer)
>> +{
>> + SECStatus status;
>> + Port *port = (Port *) arg;
>> + CERTCertificate *cert;
>> + char *peer_cn;
>> + int len;
>> +
>> + status = SSL_AuthCertificate(CERT_GetDefaultCertDB(), port->pr_fd, checksig, PR_TRUE);
>> + if (status == SECSuccess)
>> + {
>> + cert = SSL_PeerCertificate(port->pr_fd);
>> + len = strlen(cert->subjectName);
>> + peer_cn = MemoryContextAllocZero(TopMemoryContext, len + 1);
>> + if (strncmp(cert->subjectName, "CN=", 3) == 0)
>> + strlcpy(peer_cn, cert->subjectName + strlen("CN="), len + 1);
>> + else
>> + strlcpy(peer_cn, cert->subjectName, len + 1);
>> + CERT_DestroyCertificate(cert);
>> +
>> + port->peer_cn = peer_cn;
>> + port->peer_cert_valid = true;
>
> Hm. We either should have something similar to
>
> /*
> * Reject embedded NULLs in certificate common name to prevent
> * attacks like CVE-2009-4034.
> */
> if (len != strlen(peer_cn))
> {
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("SSL certificate's common name contains embedded null")));
> pfree(peer_cn);
> return -1;
> }
> here, or a comment explaining why not.

We should, but it's proving rather difficult as there is no equivalent API call
to get the string as well as the expected length of it.

> Also, what's up with the CN= bit? Why is that needed here, but not for
> openssl?

OpenSSL returns only the value portion, whereas NSS returns key=value so we
need to skip over the key= part.

>> +static PRFileDesc *
>> +init_iolayer(Port *port, int loglevel)
>> +{
>> + const PRIOMethods *default_methods;
>> + PRFileDesc *layer;
>> +
>> + /*
>> + * Start by initializing our layer with all the default methods so that we
>> + * can selectively override the ones we want while still ensuring that we
>> + * have a complete layer specification.
>> + */
>> + default_methods = PR_GetDefaultIOMethods();
>> + memcpy(&pr_iomethods, default_methods, sizeof(PRIOMethods));
>> +
>> + pr_iomethods.recv = pg_ssl_read;
>> + pr_iomethods.send = pg_ssl_write;
>> +
>> + /*
>> + * Each IO layer must be identified by a unique name, where uniqueness is
>> + * per connection. Each connection in a postgres cluster can generate the
>> + * identity from the same string as they will create their IO layers on
>> + * different sockets. Only one layer per socket can have the same name.
>> + */
>> + pr_id = PR_GetUniqueIdentity("PostgreSQL");
>
> Seems like it might not be a bad idea to append Server or something?

Fixed.

>> + /*
>> + * Create the actual IO layer as a stub such that it can be pushed onto
>> + * the layer stack. The step via a stub is required as we define custom
>> + * callbacks.
>> + */
>> + layer = PR_CreateIOLayerStub(pr_id, &pr_iomethods);
>> + if (!layer)
>> + {
>> + ereport(loglevel,
>> + (errmsg("unable to create NSS I/O layer")));
>> + return NULL;
>> + }
>
> Why is this accepting a variable log level? The only caller passes ERROR?

Good catch, that's a leftover from a previous version which no longer makes
sense. loglevel param removed.

>> +/*
>> + * pg_SSLerrmessage
>> + * Create and return a human readable error message given
>> + * the specified error code
>> + *
>> + * PR_ErrorToName only converts the enum identifier of the error to string,
>> + * but that can be quite useful for debugging (and in case PR_ErrorToString is
>> + * unable to render a message then we at least have something).
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> + char error[128];
>> + int ret;
>> +
>> + /* TODO: this should perhaps use a StringInfo instead.. */
>> + ret = pg_snprintf(error, sizeof(error), "%s (%s)",
>> + PR_ErrorToString(errcode, PR_LANGUAGE_I_DEFAULT),
>> + PR_ErrorToName(errcode));
>> + if (ret)
>> + return pstrdup(error);
>
>> + return pstrdup(_("unknown TLS error"));
>> +}
>
> Why not use psrintf() here?

Thats a good question to which I don't have a good answer. Changed to doing
just that.

>> +++ b/src/include/common/pg_nss.h
>> @@ -0,0 +1,141 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * pg_nss.h
>> + * Support for NSS as a TLS backend
>> + *
>> + * These definitions are used by both frontend and backend code.
>> + *
>> + * Copyright (c) 2020, PostgreSQL Global Development Group
>> + *
>> + * IDENTIFICATION
>> + * src/include/common/pg_nss.h
>> + *
>> + *-------------------------------------------------------------------------
>> + */
>> +#ifndef PG_NSS_H
>> +#define PG_NSS_H
>> +
>> +#ifdef USE_NSS
>> +
>> +#include <sslproto.h>
>> +
>> +PRUint16 pg_find_cipher(char *name);
>> +
>> +typedef struct
>> +{
>> + const char *name;
>> + PRUint16 number;
>> +} NSSCiphers;
>> +
>> +#define INVALID_CIPHER 0xFFFF
>> +
>> +/*
>> + * This list is a partial copy of the ciphers in NSS files lib/ssl/sslproto.h
>> + * in order to provide a human readable version of the ciphers. It would be
>> + * nice to not have to have this, but NSS doesn't provide any API addressing
>> + * the ciphers by name. TODO: do we want more of the ciphers, or perhaps less?
>> + */
>> +static const NSSCiphers NSS_CipherList[] = {
>> +
>> + {"TLS_NULL_WITH_NULL_NULL", TLS_NULL_WITH_NULL_NULL},
>
> Hm. Is this whole business of defining array constants in a header just
> done to avoid having a .c file that needs to be compiled both in
> frontend and backend code?

That was the original motivation, but I guess I should just bit the bullet and
make it a .c compiled in both frontend and backend?

>> +/*
>> + * The nspr/obsolete/protypes.h NSPR header typedefs uint64 and int64 with
>> + * colliding definitions from ours, causing a much expected compiler error.
>> + * The definitions are however not actually used in NSPR at all, and are only
>> + * intended for what seems to be backwards compatibility for apps written
>> + * against old versions of NSPR. The following comment is in the referenced
>> + * file, and was added in 1998:
>> + *
>> + * This section typedefs the old 'native' types to the new PR<type>s.
>> + * These definitions are scheduled to be eliminated at the earliest
>> + * possible time. The NSPR API is implemented and documented using
>> + * the new definitions.
>> + *
>> + * As there is no opt-out from pulling in these typedefs, we define the guard
>> + * for the file to exclude it. This is incredibly ugly, but seems to be about
>> + * the only way around it.
>> + */
>
> There's a lot of duplicated comments here. Could we move either of the
> files to reference the other for longer ones?

I took a stab at this in the attached version. The code is perhaps over-
commented in parts but I tried to encode my understanding of NSS into the
comments where documentation is lacking, since I assume I'm not the only one
who is new to NSS. There might be a need to pare back to keep it focused in
case this patch goes futher.

>> +/*
>> + * PR_ImportTCPSocket() is a private API, but very widely used, as it's the
>> + * only way to make NSS use an already set up POSIX file descriptor rather
>> + * than opening one itself. To quote the NSS documentation:
>> + *
>> + * "In theory, code that uses PR_ImportTCPSocket may break when NSPR's
>> + * implementation changes. In practice, this is unlikely to happen because
>> + * NSPR's implementation has been stable for years and because of NSPR's
>> + * strong commitment to backward compatibility."
>> + *
>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSPR/Reference/PR_ImportTCPSocket
>> + *
>> + * The function is declared in <private/pprio.h>, but as it is a header marked
>> + * private we declare it here rather than including it.
>> + */
>> +NSPR_API(PRFileDesc *) PR_ImportTCPSocket(int);
>
> Ugh. This is really the way to do this? How do other applications deal
> with this problem?

They either #include <private/pprio.h> or they do it like this (or vendor NSPR
which makes calling private APIs less problematic). It sure is ugly, but there
is no alternative to using this function.

>> +#if defined(WIN32)
>> +static const char *ca_trust_name = "nssckbi.dll";
>> +#elif defined(__darwin__)
>> +static const char *ca_trust_name = "libnssckbi.dylib";
>> +#else
>> +static const char *ca_trust_name = "libnssckbi.so";
>> +#endif
>
> There's really no pre-existing handling for this in nss???

NSS_Init does have more or less the above logic (see snippet below), but only
when there is a cert database defined.

/*
* The following code is an attempt to automagically find the external root
* module.
* Note: Keep the #if-defined chunks in order. HPUX must select before UNIX.
*/

static const char *dllname =
#if defined(XP_WIN32) || defined(XP_OS2)
"nssckbi.dll";
#elif defined(HPUX) && !defined(__ia64) /* HP-UX PA-RISC */
"libnssckbi.sl";
#elif defined(DARWIN)
"libnssckbi.dylib";
#elif defined(XP_UNIX) || defined(XP_BEOS)
"libnssckbi.so";
#else
#error "Uh! Oh! I don't know about this platform."
#endif

In the NSS_INIT_NOCERTDB case there is no such handling of the libname provided
by NSS so we need to do that ourselves.

>> + /*
>> + * The original design of NSS was for a single application to use a single
>> + * copy of it, initialized with NSS_Initialize() which isn't returning any
>> + * handle with which to refer to NSS. NSS initialization and shutdown are
>> + * global for the application, so a shutdown in another NSS enabled
>> + * library would cause NSS to be stopped for libpq as well. The fix has
>> + * been to introduce NSS_InitContext which returns a context handle to
>> + * pass to NSS_ShutdownContext. NSS_InitContext was introduced in NSS
>> + * 3.12, but the use of it is not very well documented.
>> + * https://bugzilla.redhat.com/show_bug.cgi?id=738456
>> + *
>> + * The InitParameters struct passed can be used to override internal
>> + * values in NSS, but the usage is not documented at all. When using
>> + * NSS_Init initializations, the values are instead set via PK11_Configure
>> + * calls so the PK11_Configure documentation can be used to glean some
>> + * details on these.
>> + *
>> + * https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/PKCS11/Module_Specs
>
>> +
>> + if (!nss_context)
>> + {
>> + char *err = pg_SSLerrmessage(PR_GetError());
>> +
>> + printfPQExpBuffer(&conn->errorMessage,
>> + libpq_gettext("unable to %s certificate database: %s"),
>> + conn->cert_database ? "open" : "create",
>> + err);
>> + free(err);
>> + return PGRES_POLLING_FAILED;
>> + }
>> +
>> + /*
>> + * Configure cipher policy.
>> + */
>> + status = NSS_SetDomesticPolicy();
>
> Why is "domestic" the right thing here?

Historically there are three cipher policies in NSS: Domestic, Export and
France. These would enable a set of ciphers based on US export restrictions
(domest/export) or French import restrictions. All ciphers would start
disabled and then the ciphers belonging to the chosen set would be enabled.
Long ago, that was however removed and they now all get enabled by calling
either of these three functions. NSS_SetDomesticPolicy enables all implemented
ciphers, and the other calls just call NSS_SetDomesticPolicy, I guess that API
was kept for backwards compatibility. The below bugzilla entry has a bit more
information on this:

https://bugzilla.mozilla.org/show_bug.cgi?id=848384

That being said, the comment in the code did not reflect that, so I've reworded
it hoping it will be clearer now.

>> +
>> + PK11_SetPasswordFunc(PQssl_passwd_cb);
>
> Is it actually OK to do stuff like this when other users of NSS might be
> present? That's obviously more likely in the libpq case, compared to the
> backend case (where it's also possible, of course). What prevents us
> from overriding another user's callback?

The password callback pointer is stored in a static variable in NSS (in the
file lib/pk11wrap/pk11auth.c).

>> +ssize_t
>> +pgtls_read(PGconn *conn, void *ptr, size_t len)
>> +{
>> + PRInt32 nread;
>> + PRErrorCode status;
>> + int read_errno = 0;
>> +
>> + nread = PR_Recv(conn->pr_fd, ptr, len, 0, PR_INTERVAL_NO_WAIT);
>> +
>> + /*
>> + * PR_Recv blocks until there is data to read or the timeout expires. Zero
>> + * is returned for closed connections, while -1 indicates an error within
>> + * the ongoing connection.
>> + */
>> + if (nread == 0)
>> + {
>> + read_errno = ECONNRESET;
>> + return -1;
>> + }
>
> It's a bit confusing to talk about blocking when the socket presumably
> is in non-blocking mode, and you're also asking to never wait?

Fair enough, I can agree that the wording isn't spot on. The socket is
non-blocking while PR_Recv can block (which is what we ask it not to). I've
reworded and moved the comment around to hopefully make it clearer.

>> + if (nread == -1)
>> + {
>> + status = PR_GetError();
>> +
>> + switch (status)
>> + {
>> + case PR_WOULD_BLOCK_ERROR:
>> + read_errno = EINTR;
>> + break;
>
> Uh, isn't this going to cause a busy-loop by the caller? EINTR isn't the
> same as EAGAIN/EWOULDBLOCK?

Right, that's clearly not right.

>> + case PR_IO_TIMEOUT_ERROR:
>> + break;
>
> What does this mean? We'll return with a 0 errno here, right? When is
> this case reachable?

It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
we don't do. It mentioned somewhere that it had happened in no-wait calls due
to a bug, but I fail to find that reference now. Either way, I've removed it
to fall into the default error handling which now sets errno correctly as that
was a paddle short here.

> E.g. the comment in fe-misc.c:
> /* pqsecure_read set the error message for us */
> for this case doesn't seem to be fulfilled by this.

Fixed, I hope.

>> +/*
>> + * Verify that the server certificate matches the hostname we connected to.
>> + *
>> + * The certificate's Common Name and Subject Alternative Names are considered.
>> + */
>> +int
>> +pgtls_verify_peer_name_matches_certificate_guts(PGconn *conn,
>> + int *names_examined,
>> + char **first_name)
>> +{
>> + return 1;
>> +}
>
> Uh, huh? Certainly doesn't verify anything...

Doh, the verification was done as part of the cert validation callback and I
had missed moving it to the stub. Fixed and also expanded to closer match how
it's done in the OpenSSL implementation.

>> +/* ------------------------------------------------------------ */
>> +/* PostgreSQL specific TLS support functions */
>> +/* ------------------------------------------------------------ */
>> +
>> +/*
>> + * TODO: this a 99% copy of the same function in the backend, make these share
>> + * a single implementation instead.
>> + */
>> +static char *
>> +pg_SSLerrmessage(PRErrorCode errcode)
>> +{
>> + const char *error;
>> +
>> + error = PR_ErrorToName(errcode);
>> + if (error)
>> + return strdup(error);
>> +
>> + return strdup("unknown TLS error");
>> +}
>
> Btw, why does this need to duplicate strings, instead of returning a
> const char*?

No, it doesn't, and no longer does.

The attached includes fixes for the above mentioned issues (and a few small
other ones I stumbled across), hopefully without introducing too many new. As
mentioned, I'll perform the split into multiple patches in a separate version
which only performs a split to make it easier to diff the individual patchfile
versions.

cheers ./daniel

Attachment Content-Type Size
0001-Support-for-NSS-as-a-TLS-backend-v13.patch application/octet-stream 202.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-10-27 20:18:29 Re: Support for NSS as a libpq TLS backend
Previous Message Thomas Munro 2020-10-27 20:00:17 Re: cutting down the TODO list thread