Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 23:11:02
Message-ID: 23251.1335913862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>wrote:
>> No, the RFC says (emphasis mine):
>>
>> A JSON *text* is a serialized object or array.
>>
>> If we let the JSON type correspond to a *value* instead, this
>> restriction does not apply, and the JSON type has a useful recursive
>> definition.

> I think you're playing with words. But in any case, the RFC says this
> regarding generators:
> 5. Generators
> A JSON generator produces JSON text. The resulting text MUST
> strictly conform to the JSON grammar.

I read over the RFC, and I think the only reason why they restricted
JSON texts to represent just a subset of JSON values is this cute
little hack in section 3 (Encoding):

Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8

They need a guaranteed 2 ASCII characters to make that work, and
they won't necessarily get that many with a bare string literal.

Since for our purposes there is not, and never will be, any need to
figure out whether a JSON input string is encoded in UTF16 or UTF32,
I find myself agreeing with the camp that says we might as well consider
that our JSON type corresponds to JSON values not JSON texts. I also
notice that json_in() seems to believe that already.

However, that doesn't mean I'm sold on the idea of getting rid of
array_to_json and row_to_json in favor of a universal "to_json()"
function. In particular, both of those have optional "pretty_bool"
arguments that don't fit nicely at all in a generic conversion
function. The meaning of that flag is very closely tied to the
input being an array or record respectively.

I'm inclined to leave these functions as they are, and consider
adding a universal "to_json(anyelement)" (with no options) later.
Because it would not have options, it would not be meant to cover
cases where there's value in formatting or conversion options;
so it wouldn't render the existing functions entirely obsolete,
nor would it mean there would be no need for other specialized
conversion functions.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-01 23:22:15 Re: proposal: additional error fields
Previous Message Peter Geoghegan 2012-05-01 23:07:19 Re: proposal: additional error fields