Re: Experiments with Postgres and SSL

From: Greg Stark <stark(at)mit(dot)edu>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Jacob Champion <jchampion(at)timescale(dot)com>, Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Experiments with Postgres and SSL
Date: 2023-02-28 18:32:45
Message-ID: CAM-w4HNj-iGtS+Po1qy+OSqaNQGHJ7HFpX3svWMwp6J8qP3VUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 20/01/2023 03:28, Jacob Champion wrote:
> > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> >> * "Service Mesh" type tools that hide multiple services behind a
> >> single host/port ("Service Mesh" is just a new buzzword for "proxy").
> >
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch. We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.
>
> >> Incidentally I find the logic in ProcessStartupPacket incredibly
> >> confusing. It took me a while before I realized it's using tail
> >> recursion to implement the startup logic. I think it would be way more
> >> straightforward and extensible if it used a much more common iterative
> >> style. I think it would make it possible to keep more state than just
> >> ssl_done and gss_done without changing the function signature every
> >> time for example.
> >
> > +1. The complexity of the startup logic, both client- and server-side,
> > is a big reason why I want implicit TLS in the first place. That way,
> > bugs in that code can't be exploited before the TLS handshake
> > completes.
>
> +1. We need to support explicit TLS for a long time, so we can't
> simplify by just removing it. But let's refactor the code somehow, to
> make it more clear.
>
> Looking at the patch, I think it accepts an SSLRequest packet even if
> implicit TLS has already been established. That's surely wrong, and
> shows how confusing the code is. (Or I'm reading it incorrectly, which
> also shows how confusing it is :-) )

I'll double check it but I think I tested that that wasn't the case. I
think it accepts the SSL request packet and sends back an N which the
client libpq just interprets as the server not supporting SSL and does
an unencrypted connection (which is tunneled over stunnel unbeknownst
to libpq).

I agree I would want to flatten this logic to an iterative approach
but having wrapped my head around it now I'm not necessarily rushing
to do it now. The main advantage of flattening it would be to make it
easy to support other protocol types which I think could be really
interesting. It would be much clearer to document the state machine if
all the state is in one place and the code just loops through
processing startup packets and going to a new state until the
connection is established. That's true now but you have to understand
how the state is passed in the function parameters and notice that all
the recursion is tail recursive (I think). And extending that state
would require extending the function signature which would get awkward
quickly.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it
>
> - First connect in old-fashioned way, and remember the server version.
> Later, if you reconnect to the same server, use implicit TLS if the
> server version was high enough. This would be most useful for connection
> pools.

Vladimir pointed out that this doesn't necessarily work. The server
may be new enough to support it but it could be behind a proxy like
pgbouncer or something. The same would be true if the server reported
a "connection option" instead of depending on version.

> - Connect both ways at the same time, and continue with the fastest,
> i.e. "happy eyeballs"

That seems way too complex for us to bother with imho.

> - Try implicit TLS first, and fall back to explicit TLS if it fails.

> For libpq, we don't necessarily need to do anything right now. We can
> add the implicit TLS support in a later version. Not having libpq
> support makes it hard to test the server codepath, though. Maybe just
> test it with 'stunnel' or 'openssl s_client'.

I think we should have an option to explicitly enable it in psql, if
only for testing. And then wait five years and switch the default on
it then. In the meantime users can just set it based on their setup.
That's not the way to the quickest adoption but imho the main
advantages of this option are the options it gives users, not the
latency improvement, so I'm not actually super concerned about
adoption rate.

I assume we'll keep the negotiated mode indefinitely because it can
handle any other protocols we might want. For instance, it currently
handles GSSAPI -- which raises the question, are we happy with GSSAPI
having this extra round trip? Is there a similar change we could make
for it? My understanding is that GSSAPI is an abstract interface and
the actual protocol it's invoking could be anything so we can't make
any assumptions about what the first packet looks like. Perhaps we can
do something about pipelining GSSAPI messages so if the negotiation
fails the server just closes the connection but if it accepts it it
does a similar trick with unreading the buffered data and processing
it through the GSSAPI calls.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2023-02-28 18:36:11 Re: Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Justin Pryzby 2023-02-28 18:25:08 Re: Memory leak from ExecutorState context?