Re: Experiments with Postgres and SSL

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Experiments with Postgres and SSL
Date: 2024-04-08 01:25:54
Message-ID: 0add7c94-7ece-4b63-b9af-515858ddf856@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Committed this. Thank you to everyone involved!

On 04/04/2024 14:08, Matthias van de Meent wrote:
> Patch 0003:
>
> The read size in secure_raw_read is capped to port->raw_buf_remaining
> if the raw buf has any data. While the user will probably call into
> this function again, I think that's a waste of cycles.

Hmm, yeah, I suppose we could read more data in the same call. It seems
simpler not to. The case that "raw_buf_remaining > 0" is a very rare.

> pq_buffer_has_data now doesn't have any protections against
> desynchronized state between PqRecvLength and PqRecvPointer. An
> Assert(PqRecvLength >= PqRecvPointer) to that value would be
> appreciated.

Added.

> 0006:
>
> In CONNECTION_FAILED, we use connection_failed() to select whether we
> need a new connection or stop trying altogether, but that function's
> description states:
>
>> + * Out-of-line portion of the CONNECTION_FAILED() macro
>> + *
>> + * Returns true, if we should retry the connection with different encryption method.
>
> Which to me reads like we should reuse the connection, and try a
> different method on that same connection. Maybe we can improve the
> wording to something like
> + * Returns true, if we should reconnect with a different encryption method.
> to make the reconnect part more clear.

Changed to "Returns true, if we should reconnect and retry with a
different encryption method".

> In select_next_encryption_method, there are several copies of this pattern:
>
> if ((remaining_methods & ENC_METHOD) != 0)
> {
> conn->current_enc_method = ENC_METHOD;
> return true;
> }
>
> I think a helper macro would reduce the verbosity of the scaffolding,
> like in the attached SELECT_NEXT_METHOD.diff.txt.

Applied.

In addition to the above, I made heavy changes to the tests. I wanted to
test not just the outcome (SSL, GSSAPI, plaintext, or fail), but also
the steps and reconnections needed to get there. To facilitate that, I
rewrote how the expected outcome was represented in the test script. It
now uses a table-driven approach, with a line for each test iteration,
ie. for each different combination of options that are tested.

I then added some more logging, so that whenever the server receives an
SSLRequest or GSSENCRequest packet, it logs a line. That's controlled by
a new not-in-sample GUC ("trace_connection_negotiation"), intended only
for the test and debugging. The test scrapes the log for the lines that
it prints, and the expected output includes a compact trace of expected
events. For example, the expected output for "user=testuser
gssencmode=prefer sslmode=prefer sslnegotiation=direct", when GSS and
SSL are both disabled in the server, looks like this:

# USER GSSENCMODE SSLMODE SSLNEGOTIATION EVENTS -> OUTCOME
testuser prefer prefer direct connect,
directsslreject, reconnect, sslreject, authok -> plain

That means, we expect libpq to first try direct SSL, which is rejected
by the server. It should then reconnect and attempt traditional
negotiated SSL, which is also rejected. Finally, it should try plaintext
authentication, without reconnecting, which succeeds.

That actually revealed a couple of slightly bogus behaviors with the
current code. Here's one example:

# XXX: libpq retries the connection unnecessarily in this case:
nogssuser require allow connect, gssaccept, authfail,
reconnect, gssaccept, authfail -> fail

That means, with "gssencmode=require sslmode=allow", if the server
accepts the GSS encryption but refuses the connection at authentication,
libpq will reconnect and go through the same motions again. The second
attempt is pointless, we know it's going to fail. The refactoring to the
libpq state machine fixed that issue as a side-effect.

I removed the server ssl_enable_alpn and libpq sslalpn options. The idea
was that they could be useful for testing, but we didn't actually have
any tests that would use them, and you get the same result by testing
with an older server or client version. I'm open to adding them back if
we also add tests that benefit from them, but they were pretty pointless
as they were.

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-08 01:28:31 Re: Experiments with Postgres and SSL
Previous Message Tom Lane 2024-04-08 01:25:38 Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan