Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

From: Jelte Fennema-Nio <me(at)jeltef(dot)nl>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dave Cramer <davecramer(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Burroughs <jburroughs(at)instructure(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Date: 2024-05-06 16:19:50
Message-ID: CAGECzQSgW0ZVpJuONh4MrwQHMtjAK2aeAaYsOuVdyMdU3=rmCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 23 Apr 2024 at 17:03, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> If a client doesn't know about encrypted columns and sets that
> bit at random, that will break things, and formally I think that's a
> risk, because I don't believe we document anywhere that you shouldn't
> set unused bits in the format mask. But practically, it's not likely.
> (And also, maybe we should document that you shouldn't do that.)

Currently Postgres errors out when you set anything other than 0 or 1
in the format field. And it's already documented that these are the
only allowed values: "Each must presently be zero (text) or one
(binary)."

> Let's consider a hypothetical country much like Canada except that
> there are three official languages rather than two: English, French,
> and Robertish.

I don't really understand the point you are trying to make with this
analogy. Sure an almost equivalent unused language maybe shouldn't be
put on the signs, but having two different names (aka protocol
versions) for English and Robertish seems quite useful to avoid
confusion.

> And so here. If someone codes a connection pooler in the way you
> suppose, then it will break. But, first of all, they probably won't do
> that, both because it's not particularly likely that someone wants to
> gather that particular set of statistics and also because erroring out
> seems like an overreaction.

Is it really that much of an overreaction? Postgres happily errors on
any weirdness in the protocol including unknown format codes. Why
wouldn't a pooler/proxy in the middle be allowed to do the same? The
pooler has no clue what the new format means, maybe it requires some
session state to be passed on to the server to be interpreted
correctly. The pooler would be responsible for syncing that session
state but might not have synced it because it doesn't know that the
format code requires that. An example of this would be a compressed
value of which the compression algorithm is configured using a GUC.
Thus the pooler being strict in what it accepts would be a good way of
notifying the client that the pooler needs to be updated (or the
feature not used).

But I do agree that it seems quite unlikely that a pooler would
implement it like that though. However, a pooler logging a warning
seems not that crazy. And in that case the connection might not be
closed, but the logs would be flooded.

> And secondly, let's imagine that we do
> bump the protocol version and think about whether and how that solves
> the problem. A client will request from the pooler a version 3.1
> connection and the pooler will say, sorry, no can do, I only
> understand 3.0. So the client will now say, oh ok, no problem, I'm
> going to refrain from setting that parameter format bit. Cool, right?
>
> Well, no, not really. First, now the client application is probably
> broken. If the client is varying its behavior based on the server's
> protocol version, that must mean that it cares about accessing
> encrypted columns, and that means that the bit in question is not an
> optional feature. So actually, the fact that the pooler can force the
> client to downgrade hasn't fixed anything at all.

It seems quite a lot nicer if the client can safely fallback to not
writing data using the new format code, instead of getting an error
from the pooler/flooding the pooler logs/having the pooler close the
connection.

> Third, applications, drivers, and connection poolers now all need to
> worry about handling downgrades smoothly. If a connection pooler
> requests a v3.1 connection to the server and gets v3.0, it had better
> make sure that it only advertises 3.0 to the client.

This seems quite straightforward to solve from a pooler perspective:
1. Connect to the server first to find out what the maximum version is
that it support
2. If a client asks for a higher version, advertise the server version

> If the client
> requests v3.0, the pooler had better make sure to either request v3.0
> from the server. Or alternatively, the pooler can be prepared to
> translate between 3.0 and 3.1 wherever that's needed, in either
> direction. But it's not at all clear what that would look like for
> something like TCE. Will the pooler arrange to encrypt parameters
> destined for encrypted tables if the client doesn't do so? Will it
> arrange to decrypt values coming from encrypted tables if the client
> doesn't understand encryption?

This is a harder problem, and indeed one I hadn't considered much.
From a pooler perspective I really would like to be able to have just
one pool of connections to the server all initially connected with the
3.1 protocol and use those 3.1 server connections for clients that use
the 3.1 protocol but also for clients that use the 3.0 protocol.
Creating separate pools of server connections for different protocol
versions is super annoying to manage for operators (e.g. how should
these pools be sized). One of the reasons I'm proposing the
ParameterSet message is to avoid this problem for protocol parameters
e.g. PgBouncer could then set TCE=off on server connection with
TCE=on, just before it hands it to a client with TCE=off.

I can think of a few ways of solving this issue for protocol versions:
1. Introduce a ProtocolVersionSet message, which could be send on
handoff to downgrade/upgrade the protocol version at the postgres side
2. Feature-flag all protocol changes behind protocol parameters, so
ParameterSet can be used to enable/disable everything
3. Feature-flag most protocol changes using protocol parameters, but
allow protocol changes to be made using a version bump when the server
responds the exact same way to all messages a client can send using
the previous protocol version. So don't allow a protocol version bump
to add extra fields to existing messages, nor introduce new message
types that are not sent as a 1-to-1 response to a new message type
sent by the client, nor send existing message types more often than
was expected in the previous protocol version.

I would prefer the 3rd option. 1 seems strange when considering
protocol changes that only impact the handshake, such as lengthening
the cancel key[1], also it requires adding yet another message type.
And 2 seems an overly strict version of 3.

[1]: https://www.postgresql.org/message-id/flat/508d0505-8b7a-4864-a681-e7e5edfe32aa%40iki.fi

> It's possible someone will code that
> sort of thing, but I bet a lot of people won't bother. In general, I
> think we'll quickly end up with a bunch of different protocol versions
> -- say, 3.0 through 3.4 -- but people will thoroughly test with only
> one or two of them and support for the others will either be buggy
> because it wasn't tested or work anyway because the differences didn't
> really matter in the first place.

I think this is overly pessimistic. I'm pretty sure clients and
poolers will want to support multiple protocol versions, to be able to
talk to old clients/servers. I do think we should make supporting
multiple versions as easy as possible though.

> In practice, it's already the case. If such databases don't share code
> with PostgreSQL, it seems impossible that the replication subprotocol
> works in any meaningful way.

I think the fact that the replication subprotocol is gated behind the
"replication=true" StartupMessage parameter makes it very easy to
check for support.

> It seems very likely that there are other
> dark corners of the protocol where things don't work either. And TCE
> will be another one, but bumping the protocol version doesn't fix
> that.

To be clear, gating the new TCE format code behind a protocol version
bump is **not only useful detect non-support for TCE** in such other
servers. But it can also be used to detect that this other server
actually **does support TCE**. If the postgresql server version is
used to indicate such support, then these non-Postgres servers now
need to pretend that they actually are Postgres servers by sending the
same version number in the server_version GUC ParameterStatus message.

> But I can't understand
> why you don't see practical problems with frequent version bumps. It's
> not just about the one-time effort of getting everything that doesn't
> currently understand how to negotiate a version to do so. It's about
> how everyone acts on that information, or doesn't, and whether the end
> result of all of those individual decisions is better or worse for the
> community as a whole.

I do see practical problems. But I see the exact same practical
problems when encoding new protocol feature support in the postgres
server version number instead of the protocol version number. But
encoding protocol feature support in the server version introduces
other issues, such as not being able to detect that some non-Postgres
server supports TCE.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-05-06 16:22:25 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Previous Message Justin Pryzby 2024-05-06 16:12:07 Re: pg17 issues with not-null contraints