Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-22 22:04:49
Message-ID: CAARyMpDb2ZQZ8xZ1uwZHa_rf+FP+cFKK-xiUs1ELsFoE4Wea2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think I've decided to only allow escapes of non-ASCII characters
when the database encoding is UTF8. For example, $$"\u2013"$$::json
will fail if the database encoding is WIN1252, even though WIN1252 can
encode U+2013 (EN DASH). This may be somewhat draconian, given that:

* SQL_ASCII can otherwise handle "any" language according to the documentation.

* The XML type doesn't have this restriction (it just stores the
input text verbatim, and converts it to UTF-8 before doing anything
complicated with it).

However, it's simple to implement and understand. The JSON data type
will not perform any automatic conversion between character encodings.
Also, if we want to handle this any better in the future, we won't
have to support legacy data containing a mixture of encodings.

In the future, we could create functions to compensate for the issues
people encounter; for example:

* json_escape_unicode(json [, replace bool]) returns text -- convert
non-ASCII characters to escapes. Optionally, use \uFFFD for
unconvertible characters.
* json_unescape_unicode(text [, replace text]) returns json -- like
json_in, but convert Unicode escapes to characters when possible.
Optionally, replace unconvertible characters with a given string.

I've been going back and forth on how to handle encodings in the JSON
type for a while, but suggestions and objections are still welcome.
However, I plan to proceed in this direction so progress can be made.

On another matter, should the JSON type guard against duplicate member
keys? The JSON RFC says "The names within an object SHOULD be
unique," meaning JSON with duplicate members can be considered valid.
JavaScript interpreters (the ones I tried), PHP, and Python all have
the same behavior: discard the first member in favor of the second.
That is, {"key":1,"key":2} becomes {"key":2}. The XML type throws an
error if a duplicate attribute is present (e.g. '<a href="b"
href="c"/>'::xml).

Thanks for the input,
- Joey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-07-22 23:12:48 Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Previous Message Josh Berkus 2011-07-22 21:54:42 Re: storing TZ along timestamps