Re: invalidly encoded strings

From: db(at)zigo(dot)dhs(dot)org
To: "Tatsuo Ishii" <ishii(at)postgresql(dot)org>
Cc: pgsql(at)j-davis(dot)com, ishii(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, andrew(at)dunslane(dot)net, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject: Re: invalidly encoded strings
Date: 2007-09-11 06:35:54
Message-ID: 46828.192.121.104.48.1189492554.squirrel@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

>> Try the sequence below. Then, try to dump and then reload the database.
>> When you try to reload it, you will get an error:
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0xbd
>
> I know this could be a problem (like chr() with invalid byte pattern).

And that's enough of a problem already. We don't need more problems.

> What I really want to know is, read query something like this:
>
> SELECT * FROM japanese_table ORDER BY convert(japanese_text using
> utf8_to_euc_jp);
>
> could be a problem (I assume we use C locale).

If convert() produce a sequence of bytes that can't be interpreted as a
string in the server encoding then it's broken. Imho convert() should
return a bytea value. If we hade good encoding/charset support we could do
better, but we can't today.

The above example would work fine if convert() returned a bytea. In the C
locale the string would be compared byte for byte and that's what you get
with bytea values as well.

Strings are not sequences of bytes that can be interpreted in different
ways. That's what bytea values are. Strings are in a specific encoding
always, and in pg that encoding is fixed to a single one for a whole
cluster at initdb time. We should not confuse text with bytea.

/Dennis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2007-09-11 07:17:06 Re: invalidly encoded strings
Previous Message Jeff Davis 2007-09-11 06:35:33 Re: invalidly encoded strings

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2007-09-11 06:35:55 Re: HOT patch - version 15
Previous Message Jeff Davis 2007-09-11 06:35:33 Re: invalidly encoded strings