Re: lower and upper not UTF-8 safe

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julian Satchell <j(dot)satchell(at)eris(dot)qinetiq(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: lower and upper not UTF-8 safe
Date: 2003-08-04 21:03:02
Message-ID: 11538.1060030982@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Julian Satchell <j(dot)satchell(at)eris(dot)qinetiq(dot)com> writes:
> The implementations of lower and upper in
> src/backend/utils/adt/oracle_compat.c use the single byte macros from
> ctype.h to alter individual bytes in the text string.

> If the text is UTF-8 encoded this is totally wrong, and will result in
> an invalid string that is no longer UTF-8.

Only if you use a locale that is assuming a character set that is not
UTF8 but does have characters with the high bit set. I'm not sure that
we can do anything to defend against locale/charset mismatch.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-08-04 22:13:11 Re: Thread-safe configuration option appears to
Previous Message ivan 2003-08-04 20:28:20 Re: problem with cache