Re: Confusion over Python drivers

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andrew McNamara <andrewm(at)object-craft(dot)com(dot)au>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Confusion over Python drivers
Date: 2010-02-09 02:38:47
Message-ID: 1265683127.29919.2552.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2010-02-09 at 12:51 +1100, Andrew McNamara wrote:
> Now, with the text format parameters, the parser usually does the right
> thing, since text formats have plenty of hints for us humans.

The parser doesn't care whether it's text format or binary format when
determining the type.

> However, with the binary format, unless the caller tells us, there's no way
> to tell whether we're correctly parsing the data. If the context implies
> one type, but the user passes another, we'll either get an ugly error or,
> worse, silently misparse their data.

The difference between text and binary format is this: after it has
already determined the type of the parameter,
(a) if the format is text, it passes it to the type input function
to construct the value; or
(b) if the format is binary, it passes it to the type recv function
to construct the value.

The argument to the input or recv functions may:
(a) be valid input; or
(b) be invalid input, and be detected as an error by the input or
recv function; or
(c) be invalid input, and not be detected as an error by the input
or recv function.

For a given type, the input function may be more likely to catch an
input error than the recv function; or the reverse. Either way, it is
very type-specific, and the only difference is the whether the input is
misinterpreted (type error not caught; bad) or an error is thrown (type
error caught; better).

> Using my ocpgdb module, and interacting directly with the libpq wrapping
> code, you can see how postgres reacts to various inputs:

None of the examples show a difference in the inferred type of a text
versus binary parameter for the same query.

> No parameters:
>
> >>> r=db.execute('select 1', ())
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)]

Expected, because the literal 1 (without quotes) is an integer literal,
not an unknown literal.

> Int4 parameter, type specified:
>
> >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)]

Expected, because you specified the type, and sent the binary data to
the integer recv function, and it was valid input.

> Int4 parameter, type unknown, can't be determined from context:
>
> >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_FATAL_ERROR
> >>> r.errorMessage
> 'ERROR: could not determine data type of parameter $1\n'

Expected -- there is no context to determine the type.

Why do you call it an int4 parameter? It's just bytes, and you never
told postgres what they are (as you did in the previous example).

> Int4 parameter, type unknown, can be determined from context:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)]

Expected: the function + provides the context that allows the server to
interpret the left argument as an integer.

(Again, not an int4 parameter, it's unknown)

> Text parameter, type unknown, mismatching context - surprising:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)]

Expected, because this is exactly the same as the previous one except
for the data you pass in. Notice that the same type is inferred (23).

Why do you call this "mismatching context" when the context is exactly
the same as above? The only difference is which 4 bytes you provide. You
never told postgres that the bytes were text bytes anywhere.

You may think that it's doing 1111 + 1, but it's actually doing addition
on the bytes. That is apparent in the next example:

> Date parameter, type unknown, int context, the value gets misinterpreted:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)]
>

Expected, because the only thing that could possibly detect the error is
the int4recv function, which happens to accept any 4-byte input (so it
will never fail on any 4 bytes of data).

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew McNamara 2010-02-09 02:56:49 Re: Confusion over Python drivers
Previous Message Fujii Masao 2010-02-09 01:57:44 Re: Streaming replication in docs