Re: JSON for PG 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 18:08:03
Message-ID: CA+Tgmoa9GLx06S5KiG7YgF_T+3QkY+Dfq9RF2g79STM=LEn1_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:
>
>> If, however,
>> we're not using UTF-8, we have to first turn \uXXXX into a Unicode
>> code point, then covert that to a character in the database encoding,
>> and then test for equality with the other character after that.  I'm
>> not sure whether that's possible in general, how to do it, or how
>> efficient it is.  Can you or anyone shed any light on that topic?
>
> If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no?

Well, you can pick which way you want to do the conversion. If the
database encoding is SJIS, and there's an SJIS character in a string
that gets passed to json_in(), and there's another string which also
gets passed to json_in() which contains \uXXXX, then any sort of
canonicalization or equality testing is going to need to convert the
SJIS character to a Unicode code point, or the Unicode code point to
an SJIS character, to see whether they match.

Err, actually, now that I think about it, that might be a problem:
what happens if we're trying to test two characters for equality and
the encoding conversion fails? We really just want to return false -
the strings are clearly not equal if either contains even one
character that can't be converted to the other encoding - so it's not
good if an error gets thrown in there anywhere.

> At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-20 18:14:20 Re: Command Triggers
Previous Message Heikki Linnakangas 2012-01-20 17:54:00 Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)