Re: binary protocol, again

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "P(dot) Christeas" <xrg(at)linux(dot)gr>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: binary protocol, again
Date: 2012-07-20 23:57:06
Message-ID: CA+mi_8aaBR5_Dik+eaGYM3cr+qnWP49ZGixVx2k8w=cnyVhxEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Fri, Jul 20, 2012 at 7:00 PM, P. Christeas <xrg(at)linux(dot)gr> wrote:
> On Friday 20 July 2012, you wrote:
>> You should be hunting for semicolons in the query with knowledge of
>> comments (single line, multiline, nested), strings (including
>> dollar-quoted strings, unicode-escape, standard-conforming)... It
>> seems a painful road.
>
> I follow the "better safe than sorry" approach: if the query has anything
> after a semicolon, it /could/ be a multi-query case and will failover.
> Usually, user input should only appear in parameters so cannot introduce a
> semicolon into the query string.

We have a small impedance mismatch, because we are thinking about the
same new cursor, but I am doing it in terms of "a cursor using
PQexecParams instead of PQexec, you are thinking about "a cursor using
binary types instead of text". The two cases are somewhat orthogonal,
actually though they depend each other: you can use binary stuff (as
query params) only if you use PQexecParams.

So, if you want to have binary params, why do you want to make your
life more complex supporting PQexec too and switching from a function
to the other according to the query content or the params types? Why
doesn't the cursor_bin (as you call it) always use only PQexecParams
and the ISQLParam protocol for adaptation? Yes, there are some missing
features, but I don't think it's a big deal: people will be documented
about the features and the shortcomings of either cursors and go for
what they prefer. I find the idea of the failover somewhat dangerous
as which code path will be taken will depend on the content of the
query string.

>> Great :) Sorry though, I haven't got it: have you decided introducing
>> now the cursor_bin, hence you don't need parsing the query to see if
>> it's eligible or were you already using the cursor_bin? In the latter
>> case, why are you parsing the query and not always using PQexecParams?
>
> cursor_bin /tries/ to use the binary protocol, but may still fail over into
> the text one, if the queries are not eligible.

Yes, but in order to fall back on the text protocol it is not
necessary to go back to PQexec: you can use PQexecParams with text
params. If people want to use multi-statements, identifier parameters
or other features supported in the more flexible ISQLQuote, they will
just not use cursor_bin. It is both complex on our side, error prone
on both sides and surprising on their side: I don't see any advantage
justifying that. It's much more clean to just fail with whatever error
the libpq passes us in case multi-statements are used instead of
trying a fallback that results in entirely different code being
executed (different adapters etc.)

> It is just a way of apps to say "yes, I want to go binary".
> If you think about it, there is cases where we may need to mix PQexecParams
> and PQexec in the same transaction, using the same pythonic cursor. So, that
> class must be capable of both types of call.

You can support both protocols in the same transactions by using
different cursors in the same connection. Trying to use the same
cursor looks amazingly messy to support and don't think it's worth.
What is a case in which people would want to use the same cursor for
the two param styles?

> The other thing is binary *results*. Using a different cursor clearly indicates
> that the application agrees to get the results in binary (ie. I won't break
> the world without asking), so we could set 'cursor_bin' to use that even for
> PQexec calls.

If we get a fairly complete set of typecasters for both text and
binary we could use them for the classic cursor as well on PQexec:
this seems just orthogonal wrt using PQexecParams in cursor_bin.

>> Lovely (not bikeshedding about the name: the important is the
>> protocols to be distinct)
>
> A note here: I'm planning to use the *same* adapter classes for both the
> ISQLQuote and ISQLParam usages. I think it would be ugly to have two types of
> BINARY, two types of CHAR etc. Here, you can say that if your parameters are
> adapted to [CHAR, CHAR, INTEGER] you can go PQExecParams, while if they are
> non adaptable like [CHAR, CHAR, MYFOO] it has to failover.

This is dangerous and obscure. I think, if MYFOO has an adapter or is
conform to PQexecParams then it can be passed as parameter to
cursor_bin, otherwise you get an adapter error saying what's wrong. In
other words, if you are trying to push PQexecParams for use in the
classic cursors and falling back in case you are even remotely afraid
that something may break, then no: no deal. It's not going to happen.
We don't do magic: we do boring predictable stuff, and changing
codepath based on the content of the sql string or the parameters type
is just too surprising and brittle. People may have an ISQLParam
adapter for their MyFoo and suddenly an application would break with
an adaptation error because a ; with a comment after is added to the
query and no adapter for the "legacy" ISQLQuote had been provided.

About the ugliness of having two adapters sets: either this or bloat
in each adapter will ensure. Declaring an adapter conform to both the
protocols can be easily done, it's part of the PEP 246 (the part we
implement), but ISQLParam must already have one text and one binary
adaptation function, plus possibly a method to tell whether text or
binary is to be used, plus one to get the type oid, and I don't think
I want this stuff on the simpler (or complex for its own reason)
ISQLQuote adapters. Another option would be to have a method
get_data() returning a pair (bool, str) where the bool specifies
whether str is binary or not. But even if text is required, it will be
a different string from ISQLQuote's getquoted() result (e.g. a python
date is returned as "E'2012-07-20'::date" in ISQLQuote but just as
"2012-07-20" in ISQLParam in text mode).

To mitigate the ugliness we should rather have a better package
organization: psycopg2 has two main sub-modules: "extensions" which is
"everything not strictly DBAPI" and "extras" which is (as its
docstring says) "everything I don't know yet where to put". Well,
after 12 something years psycopg is been around, we have a fairly
precise idea of what we have: 1. different subclasses of connections
and cursors and 2. adapters for extra types; that's pretty much
everything. I would organize them in "connections", "cursors", "types"
modules, keeping the old ones for hysterical raisins and not adding
anything anymore there. Adapters for the two protocols could live in
different sub-modules: psycopg2.types.quote and psycopg2.types.param
for example. The case of the strict compliancy to the dbapi (hence the
extensions module) is something I don't think has ever been exercised
in psycopg's life and I don't even think the module still compiles
without extensions, so I'd be happy to drop it (Fog, correct me if I'm
wrong).

Writing the dates example above, I've thought about another
inconsistency between the two behaviours: ISQLParam would require more
often the use of explicit cast next to the placeholders, because for
many types we append a type to disambiguate. You won't do that in
ISQLParam so some queries working in ISQLQuote would fail. OTOH you
may decide to add another feature to ISQLParam: getting the name of
the Postgres type of the adapted object (e.g. you could convert a
python placeholder %s into a postgres placeholder $1::date). But then
you would be much more uniform than we currently are in passing the
types, so you may have the opposite scenario: queries working in
ISQLParam but failing in ISQLQuote. Things may work as expected, or
some query may break with a postgres error due to cast ambiguity: I
prefer this happening only once and immediately when a program is
being developed and not potentially lurking and exploding at runtime
because of an unexpected type in a parameters set.

> WDYT?

IT: KISS :) Let's have the cursor class adapting on ISQLQuote, using
PQexec and binding client side; and cursor_bin adapting on ISQLParam,
using PQexecParams and binding server side. No automatic switch
between the two: I thank you if you want to do this to improve the
basic cursor performance but I don't want it as there's just no way to
make it robust enough. We can design a feature to use globally a
certain connection/cursor class by default so that people would decide
early in their app what to use. We could also have "from psycopg2
import psycopg3" to get the cursor_bin type automatically :) (Just
joking: I'd be happy to see this stuff in psycopg 2.5).

Cheers!

-- Daniele

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message P. Christeas 2012-07-21 11:34:22 Re: binary protocol, again
Previous Message P. Christeas 2012-07-20 18:00:44 Re: binary protocol, again