Re: Request for comment on setting binary format output per session

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Dave Cramer <davecramer(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Request for comment on setting binary format output per session
Date: 2023-10-09 20:25:32
Message-ID: dcd25c5b805735378cf846f0178bb635716a5ed1.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2023-10-04 at 15:10 -0400, Robert Haas wrote:
> I hadn't really considered client_encoding as a precedent for this
> setting. A lot of my discomfort with the proposed mechanism also
> applies to client_encoding, namely, suppose you call some function or
> procedure or whatever and it changes client_encoding on your behalf
> and now your communication with the server is all screwed up.

This may have some security implications, but we've had lots of
discussion about the general topic of executing malicious code, and the
ability to mess with the on-the-wire formats might not be any worse
than what can already happen. (Though expanding it to binary formats
might slightly increase the attack surface area.)

> That
> seems very unpleasant. Yet it's also existing behavior.

The binary format setting is better in some ways and worse in other
ways.

For text encoding, usually it's expecting a single encoding and so a
single setting at the start of the session makes sense. For binary
formats, the client is likely to support some values in binary and
others not; and user-defined types make it even messier.

On the other hand, at least the results are marked as being binary
format, so if something unexpected happens, a well-written client is
more likely to see that something went wrong. For text encoding, the
client would have to be a bit more defensive.

Another thing to consider is that using a GUC for binary formats is a
protocol change in a way that client_encoding is not. The existing
documentation for the protocol already specifies when binary formats
will be used, and a GUC would change that behavior. We absolutely would
need to update the documentation, and clients (like psql) really should
be updated.

> I think one
> could conclude on these facts either that (a) client_encoding is fine
> and the problems with controlling behavior using that kind of
> mechanism are mostly theoretical or 

I'm not clear on the exact rules for a protocol version bump and why a
GUC helps us avoid one. If we have a binary_formats GUC, the client
would need to know the server version and check that it's >=17 before
sending the "SET binary_formats='...'" commmand, right? What's the
difference between that and making it an explicit protocol message that
only >=17 understand?

In any case, I think clients and connection poolers can work around the
problems, and they are mostly minor in practice, but I wouldn't call
them "theoretical". If there's enough utility in the binary_formats
parameter, we can decide to put up with the problems; which is
different than saying there aren't any.

> (b) that we messed up with
> client_encoding and shouldn't add any more mistakes of the same ilk
> or
> (c) that we should really be looking at redesigning the way
> client_encoding works, too.

(b) doesn't seem like a very helpful perspective without some ideas
toward (c). I think (c) is worth discussing but we don't have to block
on it.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-10-09 20:26:52 Re: CHECK Constraint Deferrable
Previous Message Andres Freund 2023-10-09 20:14:39 Re: New WAL record to detect the checkpoint redo location