Re: Fwd: Proper query implementation for Postgresql driver

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:46:11
Message-ID: 542A43A3.500@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/30/2014 01:09 PM, Shay Rojansky wrote:

> The idea of using extended query protocol for non-prepared queries
> raised another "best practices" question, which I'm hoping you can help
> with. To get rid of text encoding (which is locale-dependent,
> inefficient etc.) for certain fields, it seems that we have to get rid
> of it for *all* fields. This is because we send queries without knowing
> their result columns in advance, and would like to pack all messages
> into a single packet to avoid roundtrips. In other words, when we send a
> Bind message we don't yet have a RowDescription, and so we can' specify
> on an individual columnar basis which fields should be returned as
> binary and which as text. We're left with the option of either going
> "full binary" or having to support text encoding for *all* fields - the
> abolition of which was the whole point.

Even if you can't get rid of text support, dropping simple query
protocol support and the need to support client-side parameter binding
may well be a pleasant improvement.

> We can split the query process into two roundtrips - wait for the
> RowDescription and only then send Bind - in order to pick-and-choose
> text/binary on a columnar basis, but this is a non-trivial performance
> hit which we want to avoid.

It may be worth doing exactly this if you're doing large batches where
it might be a real win to use as many binary parameters as possible, but
I agree that you wouldn't want to do it for one-shot queries.

> Regarding the option of going "full binary"; selecting all types from
> pg_type shows hundreds of types. We'd have to provide binary
> encode/decode implementation for most (?) of them.

All of them except 'internal' and 'unknown', really. There are some you
may not see in common queries, but you'll still run into them when
you're looking at the system catalogs.

Quite a few are binary compatible with each other, though, so you'll
need fewer individual implementations than you might expect. Take a look
at the castmethod in pg_cast to identify groups of binary compatible types.

> In addition, say a
> new type is added (via an extension for example); text encoding at least
> had the advantage of not forcing us to support everything: the unknown
> type would be transferred as text and we'd provide that text to the user
> as an unprocessed string. Going full binary seems to eliminate this
> possibility.

It does.

> I'd like to understand what we're supposed to do, as a Postgresql
> driver. Are we supposed to:
> 1) Go full binary and implement all types (but what to do about unknown
> ones)?

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

From the docs
(http://www.postgresql.org/docs/9.3/static/sql-createtype.html):

"The support functions input_function and output_function are required,
while the functions receive_function, send_function, ... are optional."

However, no built-in type lack binary I/O functions.

You could reasonably require that all user defined extension types must
support binary I/O. This will probably be fine in practice. As you said,
though, users would then have to install plugin for nPgSQL for each
custom type they wished to use because nPgSQL won't otherwise know what
to do with the binary data.

> 2) Do two roundtrips for queries, thereby hurting performance?
> 3) Support text encoding for all fields, and manage somehow with issues
> such as locale variations (by forcing the locale to be culture
> invariant, as we do now)?

That's pretty much what PgJDBC does, playing with extra_float_digits,
client_encoding, TimeZone, etc.

It's not lovely.

I would like to be able to specify a top-level option at Bind/Execute
time that asks the server to send binary for built-in types only, or for
a list of oids that we register ourselves as understanding binary for at
a session level. That would require a protocol change, though.

It might be worth taking some time to think about how we can help
clients get best use out of the binary protocol and noting it on the
TODO page for when we do protocol revision 4.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2014-09-30 06:15:40 Re: Fwd: Proper query implementation for Postgresql driver
Previous Message Michael Paquier 2014-09-30 05:45:11 Re: Options OUTPUT_PLUGIN_* controlling format are confusing (Was: Misleading error message in logical decoding)