Re: COPY fails on 8.1 with invalid byte sequences in text

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: COPY fails on 8.1 with invalid byte sequences in text
Date: 2006-11-03 20:25:22
Message-ID: 1162585522.31124.286.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 2006-10-31 at 23:18 -0500, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > Is this not a bug?
>
> I don't actually see that it is. The documentation is perfectly clear
> on the point:
>
> (It is your responsibility that the byte sequences you create
> are valid characters in the server character set encoding.)
>
> (This is in 4.1.2.1. String Constants) If you don't want to deal with
> this, don't use octal escapes to construct multibyte characters.
>

I have thought about this some more, and I still disagree on a few
points.

First, there is no way, as a DBA, to _not_ use octal escapes, because
there is no option to turn off. I can either inspect the application,
or just trust it, but I can't control it from the database (short of
adding a CHECK to every text column). Since it interferes with the
proper use of COPY, which in turn interferes with the use of pg_dump and
applications such as Slony, I think that it is something that should be
controlled at database.

Second, you pointed out that my patch breaks BYTEA. I'd like to point
out that (as I understand it) the patch only breaks BYTEA if you rely on
the cstring escaping to pass binary data to byteain, like:

=> SELECT E'\377'::bytea;

In my opinion, that's wrong anyway, because it will fail if you do
something like:

=> SELECT E'\377\000\377';

Because the NULL in the middle terminates the cstring, which passes a 1-
byte string to byteain, resulting in a 1-byte value, instead of a 3-byte
value which it should be.

However, if you pass ASCII data to byteain that represents binary data,
like:

=> SELECT E'\\377\\000\\377'::bytea;

It seems to work just fine, and that's what PQescapeByteaConn() does.
There may be a backwards-compatibility issue, but I don't think my patch
is 100% broken (unless, of course, you found some other way that it's
broken).

This is a practical concern because some applications use escaping that
protects against SQL injection, but does not protect against invalid
byte sequences. It's not obvious to an application programmer that
protecting against SQL injection is not enough, particularly if the
programmer is trying to be "database agnostic" or doesn't test against
different encodings. Ideally, they should use PQescapeStringConn() and
not convert anything to an octal escape, but that's not always the case.

Lastly, if we retain the current behavior, I think a note should be
added to the COPY docs. The current docs imply that if you COPY it out,
you can COPY it back in. Even if I read the above documentation note
directly after reading the COPY documentation, it would not make me
think that COPY could fail for built-in types.

Regards,
Jeff Davis

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Rusty Conover 2006-11-03 22:16:35 Operator Classes and ANALYZE
Previous Message Dennis Bjorklund 2006-11-03 18:29:57 Re: COPY fails on 8.1 with invalid byte sequences in text