Re: JSON for PG 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-02-02 13:54:32
Message-ID: CA+TgmoaJbketfb1yiZwWoiiMa92bgBNkDfkyLfFhzcdV=MVqGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> At 2012-02-01 11:28:50 -0500, robertmhaas(at)gmail(dot)com wrote:
>> It's also pretty clear that JSON
>> string -> PG text data type is going to admit of a number of error
>> conditions (transcoding errors and perhaps invalid surrogate pairs) so
>> throwing one more on the pile doesn't cost much.
>
> I'm sorry for being slow, but I don't understand what you're proposing
> to do here (if anything). Could I ask you to explain, please?
>
> Are you talking about allowing the six literal bytes "\u0000" to be
> present in the JSON? If so, I agree, there seems to be no reason to
> disallow it.
>
> Are you also saying we should allow any "\uNNNN" sequence, without
> checking for errors (e.g. invalid surrogate pairs or parts thereof)?
>
> And what transcoding errors are you referring to?

Consider the following JSON object:

"abc"

This is a JSON string. Someone is eventually going to propose a
function with a name like json_to_string() which, when given this
JSON object, returns a three-character string with the PostgreSQL text
type. That's useful and I support it. But now suppose we pass this
JSON object to that same function:

"a\u0062c"

The user will quite rightly expect that since this JSON object
represents the same value as the first JSON object, they're going to
get the same answer back from json_to_string(), namely "abc". So far,
so good. But now suppose we pass this JSON object to that same
function:

"a\u0000c"

This is going to have to be an error condition, because PostgreSQL
does not allow values of type text to contain embedded NUL characters.
Now consider this:

"a\uABCDc"

Suppose that \uABCD represents a character that exists in Unicode, but
the server-encoding is SHIFT-JIS or EUC-JP or some other system which
has no equivalent for the character represented by \uABCD. Again,
when json_to_string() is applied to a value of this type, it must
fail.

In other words, we're knowingly allowing JSON strings to contain
characters which might not be representable as PostgreSQL strings,
because JSON allows any Unicode character, and the server encoding
might not be Unicode, and the server doesn't allow embedded NULs in
any encoding.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ktm@rice.edu 2012-02-02 14:19:59 Re: Progress on fast path sorting, btree index creation time
Previous Message Robert Haas 2012-02-02 13:45:44 Re: spgist text_ops and LIKE