Re: Experiments with Postgres and SSL

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Experiments with Postgres and SSL
Date: 2024-03-05 18:57:31
Message-ID: CAOYmi+=cnV-8V8TndSkEF6Htqa7qHQUL_KnQU8-DrT0Jjnm3_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2024 at 6:09 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I hope I didn't joggle your elbow reviewing this

Nope, not at all!

> The tests are still not distinguishing whether a connection was
> established in direct or negotiated mode. So if we e.g. had a bug that
> accidentally disabled direct SSL connection completely and always used
> negotiated mode, the tests would still pass. I'd like to see some tests
> that would catch that.

+1

On Mon, Mar 4, 2024 at 7:29 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 01/03/2024 23:49, Jacob Champion wrote:
> > I'll squint more closely at the MITM-protection changes in 0008 later.
> > First impressions, though: it looks like that code has gotten much
> > less straightforward, which I think is dangerous given the attack it's
> > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
> > protocol doesn't seem particularly replay-safe to me.)
>
> Let's drop that patch. AFAICS it's not needed by the rest of the patches.

Okay, sounds good.

> > If we're interested in ALPN negotiation in the future, we may also
> > want to look at GREASE [1] to keep those options open in the presence
> > of third-party implementations. Unfortunately OpenSSL doesn't do this
> > automatically yet.
>
> Can you elaborate?

Sure: now that we're letting middleboxes and proxies inspect and react
to connections based on ALPN, it's possible that some intermediary
might incorrectly fixate on the "postgres" ID (or whatever we choose
in the end), and shut down connections that carry additional protocols
rather than ignoring them. That would prevent future graceful upgrades
where the client sends both "postgres/X" and "postgres/X+1". While
that wouldn't be our fault, it'd be cold comfort to whoever has that
middlebox.

GREASE is a set of reserved protocol IDs that you can add randomly to
your ALPN list, so any middleboxes that fail to follow the rules will
just break outright rather than silently proliferating. (Hence the
pun: GREASE keeps the joints in the pipe from rusting into place.) The
RFC goes into more detail about how to do it. And I don't know if it's
necessary for a v1, but it'd be something to keep in mind.

> Do we need to do something extra in the server to be
> compatible with GREASE?

No, I think that as long as we use OpenSSL's APIs correctly on the
server side, we'll be compatible by default. This would be a
client-side implementation, to push random GREASE strings into the
ALPN list. (There is a risk that if/when OpenSSL finally starts
supporting this transparently, we'd need to remove it from our code.)

> > If we don't have a reason not to, it'd be good to follow the strictest
> > recommendations from [2] to avoid cross-protocol attacks. (For anyone
> > currently running web servers and Postgres on the same host, they
> > really don't want browsers "talking" to their Postgres servers.) That
> > would mean checking the negotiated ALPN on both the server and client
> > side, and failing if it's not what we expect.
>
> Hmm, I thought that's what the patches does. But looking closer, libpq
> is not checking that ALPN was used. We should add that. Am I right?

Right. Also, it looks like the server isn't failing the TLS handshake
itself, but instead just dropping the connection after the handshake.
In a cross-protocol attack, there's a danger that the client (which is
not speaking our protocol) could still treat the server as
authoritative in that situation.

> > I'm not excited about the proliferation of connection options. I don't
> > have a lot of ideas on how to fix it, though, other than to note that
> > the current sslnegotiation option names are very unintuitive to me:
> > - "postgres": only legacy handshakes
> > - "direct": might be direct... or maybe legacy
> > - "requiredirect": only direct handshakes... unless other options are
> > enabled and then we fall back again to legacy? How many people willing
> > to break TLS compatibility with old servers via "requiredirect" are
> > going to be okay with lazy fallback to GSS or otherwise?
>
> Yeah, this is my biggest complaint about all this. Not so much the names
> of the options, but the number of combinations of different options, and
> how we're going to test them all. I don't have any great solutions,
> except adding a lot of tests to cover them, like Matthias did.

The default gssencmode=prefer is especially problematic if I'm trying
to use sslnegotiation=requiredirect for security. It'll appear to work
at first, but if somehow I get a credential cache into my environment,
libpq will suddenly fall back to plaintext negotiation :(

> > (Plus, we need to have a good error message when connecting to older
> > servers anyway.I think we should be able to key off of the EOF coming
> > back from OpenSSL; it'd be a good excuse to give that part of the code
> > some love.)
>
> Hmm, if OpenSSL sends ClientHello and the server responds with a
> Postgres error packet, OpenSSL will presumably consume the error packet
> or at least part of it. But with our custom BIO, we can peek at the
> server response before handing it to OpenSSL.

I don't think an error packet is going to come back with the
currently-shipped implementations. IIUC, COMMERROR packets are
swallowed instead of emitted before authentication completes. So I see
EOFs when trying to connect to older servers. Do you know of any
situations where we'd see an actual error message on the wire?

> If it helps, we could backport a nicer error message to old server
> versions, similar to what we did with SCRAM in commit 96d0f988b1.

That might be nice regardless, instead of pushing "invalid length of
startup packet" into the logs.

> > For the record, I'm adding some one-off tests for this feature to a
> > local copy of my OAuth pytest suite, which is designed to do the kinds
> > of testing you're running into trouble with. It's not in any way
> > viable for a PG17 commit, but if you're interested I can make the
> > patches available.
>
> Yes please, it would be nice to see what tests you've performed, and
> have it archived.

I've cleaned it up a bit and put it up at [1]. (If you want, I can
attach the GitHub-generated ZIP, so the mailing list has a snapshot.)

These include happy-path tests for direct SSL, some failure modes, and
an example test that combines the GSS and SSL negotiation paths. So
there might be test bugs, but with the v8 patchset, I see the
following failures:

> FAILED client/test_tls.py::test_direct_ssl_without_alpn - AssertionError: client sent unexpected data

I.e. the client doesn't disconnect if the server doesn't select our protocol.

> FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[direct-True] - AssertionError: Regex pattern did not match.
> FAILED client/test_tls.py::test_direct_ssl_failed_negotiation[requiredirect-False] - AssertionError: Regex pattern did not match.
> FAILED client/test_tls.py::test_gssapi_negotiation - AssertionError: Regex pattern did not match.

These are complaining about the "SSL SYSCALL error: EOF detected"
error messages that the client returns.

> FAILED server/test_tls.py::test_direct_ssl_without_alpn[no application protocols] - Failed: DID NOT RAISE <class 'ssl.SSLError'>
> FAILED server/test_tls.py::test_direct_ssl_without_alpn[incorrect application protocol] - Failed: DID NOT RAISE <class 'ssl.SSLError'>

I.e. the server allows the handshake to complete without a proper ALPN
selection.

Thanks,
--Jacob

[1] https://github.com/jchampio/pg-pytest-suite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-03-05 19:11:23 Re: Support a wildcard in backtrace_functions
Previous Message Bharath Rupireddy 2024-03-05 18:48:14 Re: Support a wildcard in backtrace_functions