Re: Transform groups (more FE/BE protocol issues)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: Transform groups (more FE/BE protocol issues)
Date: 2003-05-05 14:51:50
Message-ID: 27698.1052146310@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> The SQL standard defines a concept called "transform groups", which are
> basically pairs of input/ouput functions that can be switched between.
> The standard talks about different transform groups for different host
> languages, so this essentially selects between different binary output
> formats.

> I think this would fit naturally with many things we are doing and want to
> do.

I've been thinking about this a little more; it seems to open up some
questions about the current design of the new FE/BE protocol.

* There are two places presently in the protocol where the client can
specify text/binary via a boolean (represented as int8). To move to a
transform-group world, we could redefine that field as a group selector:
0 = standard text representation, 1 = standard binary representation,
other values = reserved for future implementation. The obvious
question is whether we should widen these fields to more than 8 bits.
Are we likely to need more than 256 transform groups? More than 64K?
(Read on before you answer, since some of the points below suggest we
might be transmitting a lot more of these fields than at present;
keeping them narrow might be important for bandwidth reasons.)

* The DataRow/BinaryRow distinction obviously doesn't scale to multiple
transform groups. I propose dropping the BinaryRow message type in
protocol 3.0, and instead carrying the format code (group selector)
somewhere else. A straightforward conversion would be to add it to the
body of DataRow, but I'm not convinced that's the best place; again,
read on.

* At what granularity do you wish to select the transform group type for
data being transferred in or out? Right now we've essentially assumed
that you only need to specify it once for an entire command result, but
it's fairly easy to imagine scenarios where this isn't what you want.
For example, very many people are going to want to send or receive bytea
fields as raw binary, since that's more or less the native
representation for the client (nobody likes escaping or unescaping).
It does not follow that they want raw binary for, say, timestamp fields
appearing in the same table. The problem gets even more pressing if you
want to use transform groups as a substitute for things like DateStyle,
as Peter suggested in the above-quoted message.

* ISTM the most natural granularity for specifying transform group is at
the column level. I can't see a good use-case for varying transform
type across rows of a select result, but being able to select it for
each column has clear usefulness.

* That leaves us with two issues: where does the client say what it
wants, and where does the backend report the actual transform group used
for each column? For SELECTs, from an efficiency point of view it'd be
nicest to have the client request desired transforms in Bind, and then
we could have RowDescription report the actual transforms used for each
column. This way there'd be no need to include transform info in
DataRow, which would be redundant if one doesn't expect per-row changes
in transform. I'd suggest allowing Bind to specify either a single
transform group to be applied to all columns, or per-column groups.
We'd remove the output-is-binary field from Execute.

* More or less the same considerations apply for parameter values being
sent in a Bind message. Here I'd opt for always sending a transform
group for each parameter value being sent.

* The client can hardly be expected to select per-column transforms in
Bind if it doesn't know the result column datatypes yet. In the
protocol document as it stands today, there's no way to find out the
result datatypes except a portal Describe --- which requires that you've
already done Bind. I took out the result datatypes in
prepared-statement Describe because it seemed unnecessarily complicated
to implement (there's no support in the backend right now to derive a
tupdesc from a plan without starting the executor). Clearly that'll
have to be put back though. Presumably the RowDescriptor returned by
prepared-statement Describe will return default (zero == text) transform
groups for all columns, and the client will have to know to believe its
own requests instead if it doesn't trouble to do a portal Describe after
Bind.

* Textual COPY doesn't need any changes since it'll presumably always
use transform group zero, but what do we do with binary COPY? Probably
the best thing is to add an optional header field showing the transform
group for each column, with the default assumption being that all
columns are transform group 1 (standard binary). I don't know what the
user does in the COPY TO command to select other transform groups, but
that's not a protocol-level issue so it need not be solved today.

Comments? In particular I need some feedback about how wide to make the
transform-group fields ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-05-05 14:58:45 Re: pg_dump future problem.
Previous Message Philip Warner 2003-05-05 14:49:00 Re: pg_dump future problem.

Browse pgsql-interfaces by date

  From Date Subject
Next Message Zeugswetter Andreas SB SD 2003-05-05 16:13:57 Re: Transform groups (more FE/BE protocol issues)
Previous Message Christoph Haller 2003-05-05 13:06:38 Re: libpq msvc crashing