Re: Unicode support

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Gregory Stark <stark(at)enterprisedb(dot)com>
Subject: Re: Unicode support
Date: 2009-04-14 12:36:35
Message-ID: 200904141536.35866.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
> FWIW, the SQL spec puts the onus of normalization squarely on the
> application; the database is allowed to assume that Unicode strings
> are already normalized, is allowed to behave in implementation-defined
> ways when presented with strings that aren't normalized, and provision
> of normalization functions and predicates is just another optional
> feature.

Can you name chapter and verse on that?

I see this, for example,

6.27 <numeric value function>

5) If a <char length expression> is specified, then
Case:
a) If the character encoding form of <character value expression> is not UTF8,
UTF16, or UTF32, then let S be the <string value expression>.
Case:
i)
If the most specific type of S is character string, then the result is the
number of characters in the value of S.
NOTE 134 — The number of characters in a character string is determined
according to the semantics of the character set of that character string.
ii)
Otherwise, the result is OCTET_LENGTH(S).
b) Otherwise, the result is the number of explicit or implicit <char length
units> in <char length expression>, counted in accordance with the definition
of those units in the relevant normatively referenced document.

So SQL redirects the question of character length the Unicode standard. I
have not been able to find anything there on a quick look, but I'm sure the
Unicode standard has some very specific ideas on this. Note that the matter
of normalization is not mentioned here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-04-14 12:53:52 Re: Unicode string literals versus the world
Previous Message Peter Eisentraut 2009-04-14 12:32:44 Re: Unicode support