Quick Links

Re: Bug in UTF8-Validation Code?

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-05 00:56:14
Message-ID:	20070405.095614.95827390.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
> > Thinking about this made me realize that there's another, ahem, elephant
> > in the room here: convert().
> > By definition convert() returns text strings which are not valid in the
> > server encoding. How can this be addressed?
>
> Remove convert(). Or at least redefine it as dealing in bytea not text.

That would break some important use cases.

1) A user have UTF-8 database which contains various language
data. Each language has its own table. He wants to sort a SELECT
result by using ORDER BY. Since locale cannot handle multiple
languages, he uses C locale and do the SELECT something like this:

SELECT * FROM french_table ORDER BY convert(t, 'LATIN1');
SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP');

2) A user has a UTF-8 database but unfortunately his OS's UTF-8 locale
is broken. He decided to use C locale and want to sort the result
from SELECT like this.

SELECT * FROM japanese_table ORDER BY convert(t, 'EUC_JP');

Note that sorting by UTF-8 physical order would produce random
results. So following would not help him in this case:

SELECT * FROM japanese_table ORDER BY t;

Also I don't understand what this is different to the problem when we
have a message catalogue which does not match the encoding.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-04 15:11:20 from Tom Lane

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-05 01:35:19 from Andrew - Supernews

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chris Browne	2007-04-05 01:11:34	Re: Modifying TOAST thresholds
Previous Message	ITAGAKI Takahiro	2007-04-05 00:46:40	Re: autovacuum multiworkers, patch 5