Re: Confusion over Python drivers

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Weimer <fw(at)deneb(dot)enyo(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Confusion over Python drivers
Date: 2010-02-08 20:38:40
Message-ID: 1265661520.29919.2261.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2010-02-08 at 20:29 +0100, Florian Weimer wrote:
> I'm contemplating to create a new language binding for libpq (or, to
> be more precise, turn an existing language binding into something that
> can be published). I've been agonizing a bit over how to create a
> bridge between the host language type system and the PostgreSQL type
> system. If I understand you correctly, you suggest to leave
> everything as strings. This solution has the appeal of being
> implemented easily. It also sidesteps a lot of issues revolving
> around different representation choices for numbers.

Agreed. Ultimately, the conversion has to be done somewhere, but I don't
believe the driver is the place for it. Type conversions are always
going to be imperfect, and this has some important consequences:
* The type conversion system will be endlessly tweaked to improve it
* Developers will always run into problems with it in any complex
application, so we need to allow them to circumvent the system and do it
themselves when necessary.

Both of these things point to another layer on top of the driver itself.
It could be some extra convenience functions that come with the driver,
or an entirely separate layer (like ActiveRecord). But if we always let
the developer have access to the full power of libpq, it limits the
damage that can be done by a slightly-too-creative API on top of it.

> Do you really suggest to preserve the PQexecParams API verbatim, that
> is, passing in three arrays containing type, value, and format? That
> seems to be a bit problematic. I suspect the common case will be to
> use unknown types, text format, and the default conversion from values
> to strings.

I tried to address this specifically in the document:

"For example: it should be easy to pass parameters so that PQexecParams
(and others) can be used, avoiding SQL injection risks. The important
thing is to maintain close to a one-to-one mapping between libpq and the
driver's API, and to provide all of the functionality of libpq."

In ruby-pg, you can just do:

conn.exec("INSERT INTO foo VALUES($1)", ["Jeff"])

And I think that's appropriate. What I'm saying is that there should
still exist some way to pass explicit types or formats (although that
should still be easier than it is in C ;). Here's the long form:

conn.exec("INSERT INTO foo VALUES($1)",
[{:value => "Jeff", :format => 0, :type => 0}])

The nice thing about that format is that you can do the "easy" thing for
most of the parameters in a query, but then choose binary format for
that one BYTEA parameter. That's because, in ruby, you can mix strings
and hashes in the same array.

So I'm not saying we should make everyone code ruby that looks like C.
I'm saying that the job of the driver is to provide full access to
libpq, and anything beyond that should be an optional convenience
routine, and should be free of magic and cleverness (that's the job of a
higher layer).

> Conversely, for result sets, I'm tempted to transparently decode
> escaped BYTEA columns.

Consider the following ruby-pg program, where you have two empty tables
foo and bar, each with a single BYTEA column "b":

conn = PGconn.connect(...)
conn.exec("INSERT INTO foo VALUES($1)", ["\\\\000"])

# copy the single value in foo into bar
val = conn.exec("SELECT b FROM foo LIMIT 1")[0]["b"]
conn.exec("INSERT INTO bar VALUES($1)", [val])

That copies value so that foo and bar have the same contents: a 4 byte
value "\000". What would happen though, if val was transparently
decoded? It would decode it once in ruby, and again inside of postgres
(in byteain), leaving you with a one byte value in bar, even though foo
has a four-byte value.

I really think that only higher layers should implement that kind of
magic, no matter how "obvious" it may seem that the user wants something
extra.

> > Note that the ruby-pg driver doesn't 100% adhere to those standards
> > (encoding is the primary problem, and that will be fixed).
>
> Lack of Unicode support means that I can punt that to application
> authors, I guess.

Ruby 1.9+ and Python 3.0+ both have string encoding models that can't
just be ignored. We could punt it by always returning byte sequences
rather than strings, but I think that's a particularly extreme version
of my philosophy of not trying to convert between types.

> By the way, the downside of using strings everywhere is that your
> binding API will most likely not work with SQLite (or any other
> SQL-like database which lacks column type information).

I am trying to develop standards suitable for PostgreSQL drivers based
on libpq. These are not meant to be standards for a database-agnostic
API, standards for a high-level database adapter, or even standards for
a driver written against something other than libpq (like the JDBC
driver).

Thank you for your comments. I will try to integrate these thoughts into
the document.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Bunce 2010-02-08 20:40:31 Re: Add on_plperl_init and on_plperlu_init to plperl UPDATE 3 [PATCH]
Previous Message Alvaro Herrera 2010-02-08 20:30:59 Re: Writeable CTEs patch