Quick Links

Re: lower and upper not UTF-8 safe

From:	Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Julian Satchell <j(dot)satchell(at)eris(dot)qinetiq(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: lower and upper not UTF-8 safe
Date:	2003-08-05 06:58:50
Message-ID:	20030805065850.GA12563@zf.jcu.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Aug 04, 2003 at 05:03:02PM -0400, Tom Lane wrote:
> Julian Satchell <j(dot)satchell(at)eris(dot)qinetiq(dot)com> writes:
> > The implementations of lower and upper in
> > src/backend/utils/adt/oracle_compat.c use the single byte macros from
> > ctype.h to alter individual bytes in the text string.
>
> > If the text is UTF-8 encoded this is totally wrong, and will result in
> > an invalid string that is no longer UTF-8.
>
> Only if you use a locale that is assuming a character set that is not
> UTF8 but does have characters with the high bit set. I'm not sure that
> we can do anything to defend against locale/charset mismatch.

We can try detect typical locale charset and compare it with actual
charset used in DB and send NOTICE to FE if it's mismatched. The problem
is portability of charset detection code, because there is differences
between OS. The best it's if libc support nl_langinfo(CODESET) call.
The complete code of charset detection you can found in libcharset or
glib (I use simplification of these codes and it's 300 lines:-).

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

In response to

Re: lower and upper not UTF-8 safe at 2003-08-04 21:03:02 from Tom Lane

Responses

Re: lower and upper not UTF-8 safe at 2003-08-05 13:45:50 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Shridhar Daithankar	2003-08-05 07:15:55	7.4 beta binaries
Previous Message	Oleg Bartunov	2003-08-05 05:29:43	Re: Release changes