Re: Support for NSS as a libpq TLS backend

From: Jacob Champion <pchampion(at)vmware(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Support for NSS as a libpq TLS backend
Date: 2020-11-12 22:12:42
Message-ID: FBA0C365-3B19-4E40-98F4-8B0064DB7127@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Nov 11, 2020, at 10:57 AM, Jacob Champion <pchampion(at)vmware(dot)com> wrote:
>
> False alarm -- the stderr debugging I'd added in to track down the
> assertion tripped up the "no stderr" tests. Zero failing tests now.

I took a look at the OpenSSL interop problems you mentioned upthread. I
don't see a hang like you did, but I do see a PR_IO_TIMEOUT_ERROR during
connection.

I think pgtls_read() needs to treat PR_IO_TIMEOUT_ERROR as if no bytes
were read, in order to satisfy its API. There was some discussion on
this upthread:

On Oct 27, 2020, at 1:07 PM, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
> On 20 Oct 2020, at 21:15, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> + case PR_IO_TIMEOUT_ERROR:
>>> + break;
>>
>> What does this mean? We'll return with a 0 errno here, right? When is
>> this case reachable?
>
> It should, AFAICT, only be reachable when PR_Recv is used with a timeout which
> we don't do. It mentioned somewhere that it had happened in no-wait calls due
> to a bug, but I fail to find that reference now. Either way, I've removed it
> to fall into the default error handling which now sets errno correctly as that
> was a paddle short here.

PR_IO_TIMEOUT_ERROR is definitely returned in no-wait calls on my
machine. It doesn't look like the PR_Recv() API has a choice -- if
there's no data, it can't return a positive integer, and returning zero
means that the socket has been disconnected. So -1 with a timeout error
is the only option.

I'm not completely sure why this is exposed so easily with an OpenSSL
server -- I'm guessing the implementation slices up its packets
differently on the wire, causing a read event before NSS is able to
decrypt a full record -- but it's worth noting that this case also shows
up during NSS-to-NSS psql connections, when handling notifications at
the end of every query. PQconsumeInput() reports a hard failure with the
current implementation, but its return value is ignored by
PrintNotifications(). Otherwise this probably would have showed up
earlier.

(What's the best way to test this case? Are there lower-level tests for
the protocol/network layer somewhere that I'm missing?)

While patching this case, I also noticed that pgtls_read() doesn't call
SOCK_ERRNO_SET() for the disconnection case. That is also in the
attached patch.

--Jacob

Attachment Content-Type Size
nss-handle-timeouts-and-disconnections-in-pgtls_read.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-12 22:16:33 Re: Bogus documentation for bogus geometric operators
Previous Message Peter Geoghegan 2020-11-12 22:00:07 Re: Deleting older versions in unique indexes to avoid page splits