Re: plperl.c patch to correctly support bytea inputs and output to functions and triggers.

From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: plperl.c patch to correctly support bytea inputs and output to functions and triggers.
Date: 2007-04-28 20:31:46
Message-ID: 6CBC450A-E3F0-4456-8A74-95038BBB7DEE@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On Apr 28, 2007, at 1:26 PM, Tom Lane wrote:

> Theo Schlossnagle <jesus(at)omniti(dot)com> writes:
>> I've found a bug with the way plperl/plperlu handles bytea types. It
>> fails to correctly handle bytea binary inputs and outputs.
>
> Define "correctly". The proposed patch seems to be "let's handle
> bytea differently from every other data type", and that sure doesn't
> sound like a path I want to tread.

As far as I can tell, bytea is the only datatype now that suffers
from data loss. In this I could be mistaken. I took my cues form
the way postgres handles inputing records, it switches on whether
they were received in a binary fashion or not. Since we're inside
and have a Datum (or are making one) already, everything is just
memory chunks and some characteristic of the Oid should be used to
determine whether the data should be treated as binary. As is clear
from the patch, I used "if(Oid == BYTEAOID)" as the characteristic
and perhaps there is a more robust way.

If I return a bytes from perl that looks like: "hello\0there",
postgres sees a 5 byte string "hello". That's data loss and makes it
useless as a datatype as I cannot return things like images and other
binary data.

When passing the string E'hello there\015\012' into a bytea receiving
perl function, there is no way for me to get at the actual data
passed to me. Instead I get the Cstring: "hello there\\015\\012"
which is 19 characters long instead of the 13 bytes of "bytea" data.
Worse? E'hello\000there' will be materialized as a 5 bytes "bytea" in
perl actually loosing the remainder of the data. This also makes it
impossible to work with bytes data in the plperl language; not hard,
impossible.

In a lot of ways, bytea is different from every other data type, it
is one that isn't suitable for chatacter set conversion, doesn't
trivially cast to other varying size data types (like text, varchar,
etc.). It also is the only one (of its friends text, varchar, etc.)
that suffers from data loss if used with InputFunctionCall and
OutpuFunctionCall and not handled correctly with ReceiveFunctionCall
and SendFunctionCall.

If bytea is instead a class of datatypes that represent arbitrary
binary data, I'd agree that the patch should be changed to switch on
that sort of identifier instead of the BYTEAOID Oid. If you'd clue
me into how one would go about identifying if the datatype Oid is to
be treated as an arbitrary length octet sequence not subject to
characterset conversion, then I'd happy revise the patch to be more
correct.

Best regards,

Theo

// Theo Schlossnagle
// Principal(at)OmniTI: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-04-30 04:06:01 Re: plperl.c patch to correctly support bytea inputs and output to functions and triggers.
Previous Message Tom Lane 2007-04-28 17:26:41 Re: plperl.c patch to correctly support bytea inputs and output to functions and triggers.