On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:
>> If, however,
>> we're not using UTF-8, we have to first turn \uXXXX into a Unicode
>> code point, then covert that to a character in the database encoding,
>> and then test for equality with the other character after that. I'm
>> not sure whether that's possible in general, how to do it, or how
>> efficient it is. Can you or anyone shed any light on that topic?
> If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no?
Well, you can pick which way you want to do the conversion. If the
database encoding is SJIS, and there's an SJIS character in a string
that gets passed to json_in(), and there's another string which also
gets passed to json_in() which contains \uXXXX, then any sort of
canonicalization or equality testing is going to need to convert the
SJIS character to a Unicode code point, or the Unicode code point to
an SJIS character, to see whether they match.
Err, actually, now that I think about it, that might be a problem:
what happens if we're trying to test two characters for equality and
the encoding conversion fails? We really just want to return false -
the strings are clearly not equal if either contains even one
character that can't be converted to the other encoding - so it's not
good if an error gets thrown in there anywhere.
> At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-01-20 18:14:20|
|Subject: Re: Command Triggers|
|Previous:||From: Heikki Linnakangas||Date: 2012-01-20 17:54:00|
|Subject: Removing freelist (was Re: Should I implement DROP INDEX