From: | Michael Fuhr <mike(at)fuhr(dot)org> |
---|---|
To: | Mark Richardson <markmapo(at)yahoo(dot)com> |
Cc: | pgsql-interfaces(at)postgresql(dot)org |
Subject: | Re: libpq, ecpg and the bytea data type |
Date: | 2005-09-15 09:08:15 |
Message-ID: | 20050915090814.GA68915@winnie.fuhr.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-interfaces |
On Wed, Sep 14, 2005 at 04:20:13PM -0700, Mark Richardson wrote:
> If I have a table defined...
> CREATE TABLE myTable (myByteaField bytea NOT NULL);
>
> then populate it...
> COPY myTable (myByteaField) FROM stdin;
> \000\000\000\000\000\000\000\000
> \011\101\206\155\136\035\071\135
> \012\000\162\047\105\223\322\121
> \.
Bytea values should be escaped with two backslashes, not one. See
"Binary Data Types" in the documentation:
http://www.postgresql.org/docs/7.4/interactive/datatype-binary.html
> SELECT length(myByteaField) FROM stdin;
Please post the actual command you ran; the above fails with 'relation
"stdin" does not exist.' Presumably you really queried myTable (this
might seem like nitpicking, but sometimes little differences matter).
> Length
> --------
> 0
> 8
> 1
>
> I'm assuming that this is because of the length function (seeing
> the '\0' causes strlen to be shorter than the actual data).
Actually it's because the data wasn't fully loaded due to the
incorrectly-escaped \000 characters:
SELECT '\000\001'::bytea;
bytea
-------
(1 row)
SELECT '\\000\\001'::bytea;
bytea
----------
\000\001
(1 row)
The length function doesn't have a problem with \000 characters:
SELECT length('\\000\\001'::bytea);
length
--------
2
(1 row)
If you look at the implementation of the length(bytea) function
(byteaoctetlen() in src/backend/utils/adt/varlena.c) you'll see
that it doesn't read the data at all; it simply reads a size
attribute that's stored with the data.
> So I try to use the libpq interface, and when I execute the code
> PQgetLength(pgresult, 0, 0)
> the result is 0 (for record 0, field )
Again, please post what you actually did; the above should have
failed to link with an error like "undefined reference to `PQgetLength'"
(the actual function name is PQgetlength).
> PQgetLength(pgresult, 2, 0)
> The result is 1 (for record 2, field 0). So this leads me to
> believe that the values are stored in the database as a string, and
> they are being terminated during the copy (I also tried inserts,
> with the same results).
The values are stored as a sequence of bytes, and they are indeed
being truncated during the COPY because the data was incorrectly
escaped.
> On a last ditch effort, I used ECPG with the following code...
> EXEC SQL BEGIN DECLARE SECTION;
> char myByteaField_[3][8];
> EXEC SQL END DECLARE SECTION;
>
> EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable;
> for (int i=0; i<3; i++)
> for (int j=0; j<8; j++)
> std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
> std::cout << std::endl;
>
> I get the following....
> (0).(0).(0).(0).(0).(0).(0).(0). <- this is expected
The myByteaField_ array might have been zeroed to begin with, and
you're probably seeing that data, not data filled in from the query.
What happens if you fill the array with a character like '*' before
doing the SELECT? If I do that with the data input as in your
example, I see the first character as \0 and the rest of the data
as the garbage in the array:
(0).*(42).*(42).*(42).*(42).*(42).*(42).*(42).
> \(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48) <- why escape sequence?
Non-printable characters are escaped unless you request the results
in binary; see the documentation link posted earlier.
> \(92).0(48).1(49).2(50).(0).(0).(0).(0) <- dropped the last 6 bytes
>
> So I'm pretty sure that the data is stored as a string, and it
> is being terminated at the '\0'.
Bytea and other variable-length types are stored as a length attribute
and then the data itself. See the discussion of TOAST in the
documentation:
http://www.postgresql.org/docs/8.0/interactive/storage-toast.html
> ECPG not only has the same behavior, but the conversion of the
> data to a char array seems incorrect (maybe the ECPG pre-compiler
> is thinking that I want to_char() of the value of the byte array).
Unless you request binary results, queries return text representations
of the data; for bytea that means that non-printable characters will
be escaped.
> So my question is, how would you be able to store any data in a
> bytea if it contained any bytes with the value of 0 (I'm sure there
> are images stored as bytea and they have 0 byte values). I don't
> need to store an image, just these 8 bytes.
Escape the data correctly with two backslashes; the "Binary Data
Types" documentation explains why this is necessary.
--
Michael Fuhr
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Richardson | 2005-09-15 19:18:05 | Re: libpq, ecpg and the bytea data type |
Previous Message | Tom Lane | 2005-09-15 02:49:53 | Re: Beta of DBD::Pg 1.44 available |