Re: libpq, ecpg and the bytea data type

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Mark Richardson <markmapo(at)yahoo(dot)com>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: libpq, ecpg and the bytea data type
Date: 2005-09-15 09:08:15
Message-ID: 20050915090814.GA68915@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

On Wed, Sep 14, 2005 at 04:20:13PM -0700, Mark Richardson wrote:
> If I have a table defined...
> CREATE TABLE myTable (myByteaField bytea NOT NULL);
>
> then populate it...
> COPY myTable (myByteaField) FROM stdin;
> \000\000\000\000\000\000\000\000
> \011\101\206\155\136\035\071\135
> \012\000\162\047\105\223\322\121
> \.

Bytea values should be escaped with two backslashes, not one. See
"Binary Data Types" in the documentation:

http://www.postgresql.org/docs/7.4/interactive/datatype-binary.html

> SELECT length(myByteaField) FROM stdin;

Please post the actual command you ran; the above fails with 'relation
"stdin" does not exist.' Presumably you really queried myTable (this
might seem like nitpicking, but sometimes little differences matter).

> Length
> --------
> 0
> 8
> 1
>
> I'm assuming that this is because of the length function (seeing
> the '\0' causes strlen to be shorter than the actual data).

Actually it's because the data wasn't fully loaded due to the
incorrectly-escaped \000 characters:

SELECT '\000\001'::bytea;
bytea
-------

(1 row)

SELECT '\\000\\001'::bytea;
bytea
----------
\000\001
(1 row)

The length function doesn't have a problem with \000 characters:

SELECT length('\\000\\001'::bytea);
length
--------
2
(1 row)

If you look at the implementation of the length(bytea) function
(byteaoctetlen() in src/backend/utils/adt/varlena.c) you'll see
that it doesn't read the data at all; it simply reads a size
attribute that's stored with the data.

> So I try to use the libpq interface, and when I execute the code
> PQgetLength(pgresult, 0, 0)
> the result is 0 (for record 0, field )

Again, please post what you actually did; the above should have
failed to link with an error like "undefined reference to `PQgetLength'"
(the actual function name is PQgetlength).

> PQgetLength(pgresult, 2, 0)
> The result is 1 (for record 2, field 0). So this leads me to
> believe that the values are stored in the database as a string, and
> they are being terminated during the copy (I also tried inserts,
> with the same results).

The values are stored as a sequence of bytes, and they are indeed
being truncated during the COPY because the data was incorrectly
escaped.

> On a last ditch effort, I used ECPG with the following code...
> EXEC SQL BEGIN DECLARE SECTION;
> char myByteaField_[3][8];
> EXEC SQL END DECLARE SECTION;
>
> EXEC SQL SELECT myByteaField INTO :myByteaField_ FROM myTable;
> for (int i=0; i<3; i++)
> for (int j=0; j<8; j++)
> std::cout << myByteaField_[i][j] << "(" << int(myByteaField_[i][j] << ").";
> std::cout << std::endl;
>
> I get the following....
> (0).(0).(0).(0).(0).(0).(0).(0). <- this is expected

The myByteaField_ array might have been zeroed to begin with, and
you're probably seeing that data, not data filled in from the query.
What happens if you fill the array with a character like '*' before
doing the SELECT? If I do that with the data input as in your
example, I see the first character as \0 and the rest of the data
as the garbage in the array:

(0).*(42).*(42).*(42).*(42).*(42).*(42).*(42).

> \(92).0(48).1(49).1(49).A(65).\(92).2(50).0(48) <- why escape sequence?

Non-printable characters are escaped unless you request the results
in binary; see the documentation link posted earlier.

> \(92).0(48).1(49).2(50).(0).(0).(0).(0) <- dropped the last 6 bytes
>
> So I'm pretty sure that the data is stored as a string, and it
> is being terminated at the '\0'.

Bytea and other variable-length types are stored as a length attribute
and then the data itself. See the discussion of TOAST in the
documentation:

http://www.postgresql.org/docs/8.0/interactive/storage-toast.html

> ECPG not only has the same behavior, but the conversion of the
> data to a char array seems incorrect (maybe the ECPG pre-compiler
> is thinking that I want to_char() of the value of the byte array).

Unless you request binary results, queries return text representations
of the data; for bytea that means that non-printable characters will
be escaped.

> So my question is, how would you be able to store any data in a
> bytea if it contained any bytes with the value of 0 (I'm sure there
> are images stored as bytea and they have 0 byte values). I don't
> need to store an image, just these 8 bytes.

Escape the data correctly with two backslashes; the "Binary Data
Types" documentation explains why this is necessary.

--
Michael Fuhr

In response to

Browse pgsql-interfaces by date

  From Date Subject
Next Message Mark Richardson 2005-09-15 19:18:05 Re: libpq, ecpg and the bytea data type
Previous Message Tom Lane 2005-09-15 02:49:53 Re: Beta of DBD::Pg 1.44 available