Re: [PATCH] plpythonu datatype conversion improvements

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Caleb Welton" <cwelton(at)greenplum(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] plpythonu datatype conversion improvements
Date: 2009-08-16 12:35:40
Message-ID: op.uyq89qqfcke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> Primary motivation of the attached patch is to support handling bytea
>> conversion allowing for embedded nulls, which in turn allows for
>> supporting the marshal module.
>>
>> Secondary motivation is slightly improved performance for conversion
>> routines of basic datatypes that have simple mappings between
>> postgres/python.
>>
>> Primary design is to change the conversion routines from being based
>> on cstrings to datums, eg:
>> PLyBool_FromString(const char *) =>
>> PLyBool_FromBool(PLyDatumToOb, Datum);
>
> I have reworked this patch a bit and extended the plpython test suite
> around it. Current copy attached.
>
> The remaining problem is that the patch loses domain checking on the
> return types, because some paths no longer go through the data type's
> input function. I have marked these places as FIXME, and the regression
> tests also contain a failing test case for this.
>
> What's needed here, I think, is an API that takes a datum plus type
> information and checks whether the datum is valid within the domain. I
> haven't found one that is exported, but maybe someone could give a tip.

I see an intersection between the work I'm currently doing on COPY BINARY
and this.

Basically if you have an INT, you aren't going to make lots of checks.

However, for a TEXT, postgres needs to reject it if it has a NULL in it
(which doesn't bother Python at all), or if it is has chars which are not
valid in the current encoding, etc.
Many other types like TIMESTAMP have checks which are absolutely necessary
for correctness...

> What's needed here, I think, is an API that takes a datum plus type
> information and checks whether the datum is valid within the domain. I
> haven't found one that is exported, but maybe someone could give a tip.

Problems :

- If the data you're trying to put in the Datum doesn't fit (example : out
of range error, varchar too small, etc), and you want a
datum-type-specific function to check your datum and reject it, how are
you going to build the datum ? perhaps you can't, since your value doesn't
fit. It's a chicken and egg problem : the check function that you expect
to reject your invalid datum will not know it's invalid, since you've
trimmed it at the edges to make it fit in the required Datum type...

- you are going to build a datum that is perhaps valid, and perhaps not,
and send this to a function... having known-invalid datums moving around
could be not such a good idea...

Why not use the copy binary format to communicate between python and pg ?

-> you write code to serialize python objects to binary form
-> you call the recv function to get a postgres datum
-> recv function throws an error if there is any problem

-> as a bonus, you release your python object <-> postgres binary code as
a separate library so people can use it to output data readable by COPY
BINARY and parse COPY BINARY dumps.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-08-16 12:57:08 Re: GRANT ON ALL IN schema
Previous Message Sam Mason 2009-08-16 12:35:03 docs for random function