Re: text and bytea

From: "hernan gonzalez" <hgonzalez(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: text and bytea
Date: 2008-02-25 16:52:17
Message-ID: 48692c2d0802250852w7401bdc4i6f420d4eecf3e933@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Another example (Psotgresql 8.3.0, UTF-8 server/client encoding)

test=# create table chartest ( c text);
test=# insert into chartest (c) values ('¡Hasta mañana!');
test=# create view vchartest as
select encode(convert_to(c,'LATIN9'),'escape') as c1 from chartest;

test=# select c,octet_length(c) from chartest ;
c | octet_length
----------------+--------------
¡Hasta mañana! | 16

test=# select c1,octet_length(c1) from vchartest ;
c1 | octet_length
--------------+--------------
Hasta maana! | 14

(the field is seen as as text by postgresql, with the default
encoding.. UTF8; it is actually not)

test=# select * from vchartest where c1 like '%a%';
c1
--------------
Hasta maana!
(1 row)

test=# select * from vchartest where c1 ilike '%a%';
ERROR: invalid byte sequence for encoding "UTF8": 0xa1
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

That "ilike" breaks and "like" is rather random, it seems that the
later has some optimization does not check the validty of the utf8
stream. But thats not the point. The point is that IMO postgresql
should always handle text in the backend encoding, there should no
exists funcions that are designed to produce/consume texts in other
encodings. Perhaps the "encode" function is ill defined, and should be
rethinked. Two alternatives:
1. For special binary-to-ascii encodings (base64,hex). Keep its
present signature but remove/deprecate the "escape" type. It returns a
text in the backend encoding.
2 For arbitrary binary encodings. Change its signature so that it
returns bytea.
Of course, all this applies symmetrically to decode().

Appart, and in accordance with this, I think to_ascii() should accept
only one text argument.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Lau 2008-02-25 17:40:43 syntax error at or near "PROCEDURAL"
Previous Message Tom Lane 2008-02-25 16:43:15 Re: HOW does 8.3 CREATE REPLACe .. FUNCTION ..COST work