Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jelte Fennema-Nio <me(at)jeltef(dot)nl>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Burroughs <jburroughs(at)instructure(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dave Cramer <davecramer(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Date: 2024-03-14 20:33:41
Message-ID: CA+TgmobCj-+BQH7Bd1NNB3p-Z1iz4q4HdvNkBHO0EfD3BgYgqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 14, 2024 at 1:54 PM Jelte Fennema-Nio <me(at)jeltef(dot)nl> wrote:
> In my view there can be, **by definition,** only two general types of
> protocol changes:
> 1. A "simple protocol change": This is one that requires agreement by
> both the client and server, that they understand the new message types
> involved in this change. e.g. the ParameterSet message proposal (this
> message type is either supported or it's not)
> 2. A "parameterized protocol change": This requires the same as 1, but
> should also allow some parameterization from the client, e.g. for the
> compression proposal, the client should specify what compression
> algorithm the server should use to compress data when sending it to
> the client.

You seem to be supposing here that all protocol changes will consist
of adding new message types. While I think that will be a common
pattern, I do not think it will be a universal one. I do agree,
however, that every possible variation of the protocol is either
Boolean-valued (i.e. are we doing X or not?) or something more
complicated (i.e. how exactly are doing X?).

> Client and Server can agree that a "simple protocol change" is
> supported by both advertising a minimum minor protocol version. And
> for a "parameterized protocol change" the client can send a _pq_
> parameter in the startup packet.
>
> So, new _pq_ parameters should only ever be introduced for
> parameterized protocol changes. They are not meant to advertise
> support, they are meant to configure protocol features. For a
> non-configurable protocol feature, we'd simply bump the protocol
> version. And since _pq_ parameters thus always control some setting at
> the session level, we can simply store it as a GUC, because that's how
> we store all our parameters at a session level.

This is definitely not how I was thinking about it. I was imagining
that we wanted to reserve bumping the protocol version for more
significant changes, and that we'd use _pq_ parameters for relatively
minor new functionality whether Boolean-valued or otherwise.

> I think given the automatic downgrade supported by the
> NegotiateProtocolVersion, there's no real down-side to requesting the
> newest version by default. The only downside I can see is when
> connecting to other applications (i.e. non PostgreSQL servers) that
> implement the postgres protocol, but don't implement
> NegotiateProtocolVersion. But for that I added the
> max_protocol_version connection option in 0006 (of my latest
> patchset).

You might be right. This is a minor point that's not worth arguing
about too much.

> > I'm really unhappy with 0002-0004.
>
> Just to be clear, (afaict) your argument below seems to only really be
> about 0004, not about 0002 or 0003. Was there anything about 0002 &
> 0003 that you were unhappy with? 0002 & 0003 are not dependent an 0004
> imho. Because even when not making _pq_ parameters map to GUCs, we'd
> still need to change libpq to not instantly close the connection
> whenever a _pq_ parameter (or new protocol version) is not supported
> by the server (which is what 0002 & 0003 do).

I completely agree that we need to avoid slamming the connection shut.
What I don't agree with is taking the list of protocol extensions that
the server knows about and putting it into an array of strings that
the user can see. I don't want the user to know or care so much about
what's happening at the wire protocol level. The user is entitled to
know whether PQsetProtocolParameter() will work or not, and the user
is entitled to know whether it has a chance of working next time if it
didn't work this time, and when it fails, the user is entitled to a
good error message explaining the reason for the failure. But the user
is not entitled to know what negotiation took place over the wire to
figure that out. They shouldn't need to know that the _pq_ namespace
exists, and they shouldn't need to know whether we negotiated the
availability or unavailability of PQsetProtocolParameter() using [a]
the protocol minor version number, [b] the protocol major version
number, [c] a protocol option called parameter_set, or [d] a protocol
option called bananas_foster. Those are all things that might change
in the future.

Just as a for instance, I had a thought that we might accumulate a few
new message types controlled by protocol options (ParameterSet,
AlwaysSendTypeInBinaryFormat, whatever else) while keeping the
protocol version as 3.0, and then eventually bump the protocol version
to 3.1 where all of that would be mandatory functionality. So the
protocol parameters wouldn't be specified any more when using 3.1, but
they would be specified when talking to older 3.0 servers. That
difference shouldn't be visible to the user. The user should only know
the result of the negotiation.

> Okay, our interpretation is very different here. From my perspective
> introducing a non-GUC namespace is NOT AT ALL the benefit of the _pq_
> prefix. The main benefit (imho) is that it allows sending an
> "optional" parameter (i.e GUC) in the startup packet. So, one where if
> the server doesn't recognize it the connection attempt still succeeds.
> If you specify a normal GUC in the connection parameters and the
> server doesn't know about it, the server will close the connection.

But this is another example of a problem that can *easily* be fixed
without using up the entirety of the _pq_ namespace. You can introduce
_pq_.dont_error_on_unknown_gucs=1 or
_pq_.dont_error_on_these_gucs='foo, bar, baz'. The distinction between
the startup packet containing whizzbang=frob and instead containing
_pq_.whizzbang=frob shouldn't be just whether an error is thrown if we
don't know anything about whizzbang.

> > ... but no matter what we do exactly, I don't want the very first
> > protocol extension we ever add to eat up all of _pq_. I intended that
> > to support decades worth of extensibility, not just one patch.
>
> This seems to be the core of your argument. But honestly, I don't
> understand this logic at all. Why do you think that assigning _pq_
> parameters to GUCs **for the ones that match an existing GUC** would
> have such a far reaching effect into the future. There's only a
> handful of _pq_ parameters being proposed on the mailinglist at the
> moment. Even if we implement all of those as GUCs, and in the future,
> we'd want a _pq_ parameter that does not map to GUC (which I
> personally doubt will ever be the case). Then we can simply change the
> server code like so and do something special for that parameter:

I guess I'm in the same position as you are -- your argument doesn't
really make any sense to me. That also has the unfortunate
disadvantage of making it difficult for me to explain why I don't
agree with you, but let me just tick off a few things that I'm
thinking about here:

1. Connection poolers. If I'm talking to pgpool and pgpool is talking
to the server, and pgpool and I agree to use compression, that's
completely separate from whether pgpool and the server are using
compression. If I have to interrogate the compression state by
executing "SHOW some_compression_guc", then I'm going to get the wrong
answer unless pgpool runs a full SQL parser on every command that I
execute and intercepts the ones that touch protocol parameters. That's
bound to be expensive an unreliable -- consider something like SELECT
current_setting('some_compression_guc') || ' ' |
current_setting('some_other_guc') which isn't half as pathological as
it first looks. I want to be able to know the state of my protocol
parameters by calling libpq functions that answer my questions
definitively based on libpq's own internal state. libpq itself *must*
know what compression I'm using on my connection; the server's answer
may be different.

2. Clarity of meaning across versions. Let's say we add a protocol
option in the future that expands the message-type byte into a
two-byte word. Failure of the two sides to agree on the value of this
protocol option is, fairly obviously, a catastrophe. I assume that if
we actually did something like this, there's a fair chance it would be
protocol version 4.0 rather than an option to 3.whatever, but it's a
good example of something that might someday need to be changed that
is not just a new message type and about which the communicating
parties must absolutely agree. Let's say we do as you propose and have
a GUC wide_message_types = {true | false}. Now, what happens when a
sneaky user of an older libpq, which does not know about this option,
tries to connect to a newer server? As I see it, in your proposal, the
client thinks they're just setting a GUC, but the server thinks we're
completely changing up the wire protocol. Disaster ensues. From my
point of view, the problem is created by the fact that you're mixing
together two things which ought to be kept well-separated -- the act
of negotiating what protocol variant we're using, on the one hand, and
the setting of particular GUCs to particular values, on the other.

3. Generally, and maybe this is just an expansion of the previous
point, it feels to me like you've conflated the thing you want to do
right now with what everybody who wants to modify the protocol will
ever want to do in the future. It's just all GUCs, all the time! But
the GUC model is actually a poor fit in all kinds of scenarios, which
is why we have all kinds of other ways to configure things too, like
connection parameters for instance. Now, to be fair, it's often useful
to expose values that are configured through some other means as
read-only GUCs, so the dividing line between GUCs and other things
does get a bit sloppy. And we're already using client_encoding, which
is a GUC, for something that really ought not to have been handled
that way ... because to take the connection pooler example again,
there's no reason -- other than bad wire-protocol design -- why the
encoding being used between the client and the pooler needs to match
the encoding being used between the pooler and the server. But in my
view, this isn't evidence that we should continue to muddy the
distinction between things that ought to be protocol parameters and
things that ought to be GUCs; rather, it's evidence of the need to
make the distinction between the two as crisp as we possibly can.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton A. Melnikov 2024-03-14 20:36:12 Re: Add the ability to limit the amount of memory that can be allocated to backends.
Previous Message Thomas Munro 2024-03-14 20:28:19 Re: Recent 027_streaming_regress.pl hangs