Quick Links

Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	"A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date:	2012-01-24 17:55:56
Message-ID:	CA+TgmoYmM1wgN4Qpmh4qeBCuC68OHFnVUBjfFx7erUwXZCSiqg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 24, 2012 at 11:16 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Our current protocol allocates a 2-byte integer for the purposes of
>> specifying the type of each parameter, and another 2-byte integer for
>> the purpose of specifying the result type... but only one bit is
>> really needed at present: text or binary. If we revise the protocol
>> version at some point, we might want to use some of that bit space to
>> allow some more fine-grained negotiation of the protocol version. So,
>> for example, we might define the top 5 bits as reserved (always pass
>> zero), the next bit as a text/binary flag, and the remaining 10 bits
>> as a 10-bit "format version number". When a change like this comes
>> along, we can bump the highest binary format version recognized by the
>> server, and clients who request the new version can get it.
>>
>> Alternatively, we might conclude that a 2-byte integer for each
>> parameter is overkill and try to cut back... but the point is there's
>> a bunch of unused bitspace there now. In theory we could even do
>> something this without bumping the protocol version since the
>> documentation seems clear that any value other than 0 and 1 yields
>> undefined behavior, but in practice that seems like it might be a bit
>> too edgy.
>
> Yeah. But again, this isn't a contract between libpq and the server,
> but between the application and the server...

I don't see how this is relevant. The text/binary format flag is
there in both libpq and the underlying protocol.

> So I'd vote against any format code
> beyond the text/binary switch that currently exists (which, by the
> way, while useful, is one of the great sins of libpq that we have to
> deal with basically forever). While wire formatting is granular down
> to the type level, applications should not have to deal with that.
> They should Just Work. So who decides what format code to stuff into
> the protocol? Where are the codes defined?
>
> I'm very much in the camp that sometime, presumably during connection
> startup, the protocol accepts a non-#defined-in-libpq token (database
> version?) from the application that describes to the server what wire
> formats can be used and the server sends one back. There probably has
> to be some additional facilities for non-core types but let's put that
> aside for the moment. Those two tokens allow the server to pick the
> highest supported wire format (text and binary!) that everybody
> understands. The server's token is useful if we're being fancy and we
> want libpq to translate an older server's wire format to a newer one
> for the application. This of course means moving some of the type
> system into the client, which is something we might not want to do
> since among other things it puts a heavy burden on non-libpq driver
> authors (but then again, they can always stay on the v3 protocol,
> which can benefit from being frozen in terms of wire formats).

I think it's sensible for the server to advertise a version to the
client, but I don't see how you can dismiss add-on types so blithely.
The format used to represent any given type is logically a property of
that type, and only for built-in types is that associated with the
server version.

I do wonder whether we are making a mountain out of a mole-hill here,
though. If I properly understand the proposal on the table, which
it's possible that I don't, but if I do, the new format is
self-identifying: when the optimization is in use, it sets a bit that
previously would always have been clear. So if we just go ahead and
change this, clients that have been updated to understand the new
format will work just fine. The server uses the proposed optimization
only for arrays that meet certain criteria, so any properly updated
client must still be able to handle the case where that bit isn't set.
On the flip side, clients that aren't expecting the new optimization
might break. But that's, again, no different than what happened when
we changed the default bytea output format. If you get bit, you
either update your client or shut off the optimization and deal with
the performance consequences of so doing. In fact, the cases are
almost perfectly analogous, because in each case the proposal was
based on the size of the output format being larger than necessary,
and wanting to squeeze it down to a smaller size for compactness.

And more generally, does anyone really expect that we're never going
to change the output format of any type we support ever again, without
retaining infinite backward compatibility? I didn't hear any screams
of outrage when we updated the hyphenation rules for contrib/isbn -
well, ok, there were some howls, but that was because the rules were
still incomplete and US-centric, not so much because people thought it
was unacceptable for the hyphenation rules to be different in major
release N+1 than they were in major release N. If the IETF goes and
defines a new standard for formatting IPv6 addresses, we're likely to
eventually support it via the inet and cidr datatypes. The only
things that seem reasonably immune to future changes are text and
numeric, but even with numeric it's not impossible that the maximum
available precision or scale could eventually be different than what
it is now. I think it's unrealistic to suppose that new major
releases won't ever require drivers or applications to make any
updates. My first experience with this was an application that got
broken by the addition of attisdropped, and sure, I spent a day
cursing, but would I be happier if PostgreSQL didn't support dropping
columns? No, not really.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements at 2012-01-24 16:16:02 from Merlin Moncure

Responses

Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements at 2012-01-24 21:33:46 from Merlin Moncure

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-01-24 17:57:36	Re: lots of unused variable warnings in assert-free builds
Previous Message	Joshua D. Drake	2012-01-24 17:48:35	PgNext: CFP