Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 09:29:29
Message-ID: 1335950969.3106.438.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2012-05-01 at 19:11 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>wrote:
> >> No, the RFC says (emphasis mine):
> >>
> >> A JSON *text* is a serialized object or array.
> >>
> >> If we let the JSON type correspond to a *value* instead, this
> >> restriction does not apply, and the JSON type has a useful recursive
> >> definition.
>
> > I think you're playing with words. But in any case, the RFC says this
> > regarding generators:
> > 5. Generators
> > A JSON generator produces JSON text. The resulting text MUST
> > strictly conform to the JSON grammar.
>
> I read over the RFC, and I think the only reason why they restricted
> JSON texts to represent just a subset of JSON values is this cute
> little hack in section 3 (Encoding):
>
> Since the first two characters of a JSON text will always be ASCII
> characters [RFC0020], it is possible to determine whether an octet
> stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> at the pattern of nulls in the first four octets.
> 00 00 00 xx UTF-32BE
> 00 xx 00 xx UTF-16BE
> xx 00 00 00 UTF-32LE
> xx 00 xx 00 UTF-16LE
> xx xx xx xx UTF-8
>
> They need a guaranteed 2 ASCII characters to make that work, and
> they won't necessarily get that many with a bare string literal.
>
> Since for our purposes there is not, and never will be, any need to
> figure out whether a JSON input string is encoded in UTF16 or UTF32,
> I find myself agreeing with the camp that says we might as well consider
> that our JSON type corresponds to JSON values not JSON texts. I also
> notice that json_in() seems to believe that already.
>
> However, that doesn't mean I'm sold on the idea of getting rid of
> array_to_json and row_to_json in favor of a universal "to_json()"
> function. In particular, both of those have optional "pretty_bool"
> arguments that don't fit nicely at all in a generic conversion
> function. The meaning of that flag is very closely tied to the
> input being an array or record respectively.

The flags probably should not be tied to specific type, as JSON is
recursive and as such I think the current one-top-level-element-per row
is quite limited form of pretty-printing.

I have a table with a field the type of which is an array of type of
another table, and what I currently get with pretty=true is

hannu=# select row_to_json(test3, true) from test3;
-[ RECORD
1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
row_to_json | {"id":1,
|
"data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}],
| "tstamp":"2012-04-16 14:40:15.795947"}

What I would like to get what python's pprint does for the same json:

>>> pprint(row)
{'id': 1,
'data3': [{'data2': {'data': '0.262814193032682',
'id': 1,
'tstamp': '2012-04-05 13:21:03.235204'},
'id': 1,
'tstamp': '2012-04-05 13:25:03.644497'},
{'data2': {'data': '0.157406373415142',
'id': 2,
'tstamp': '2012-04-05 13:21:05.2033'},
'id': 2,
'tstamp': '2012-04-05 13:25:03.644497'}],
'tstamp': '2012-04-16 14:40:15.795947'}

If we have a pretty flag why not make it work all the way down the
structure ?

> I'm inclined to leave these functions as they are, and consider
> adding a universal "to_json(anyelement)" (with no options) later.

To achieve recursive prettyprinting the better way is to have an
universal to_json(anyelement) with a prettyprinting option

to_json(datum anyelement, indent int)

with the behavior that if indent is NULL or negative integer no
pretty-printing is done, if it is 0 printing starts at left margin and
if it is a positive integer then this number of spaces is added to the
left for each row (except the first one) of the json representation.

And it would be overridable for specific types, so that hstore could
provide its own

to_json(datum hstore, indent int)

which would do the correct pretty-printing for hstor-as-json_object
representation.

> Because it would not have options, it would not be meant to cover
> cases where there's value in formatting or conversion options;
> so it wouldn't render the existing functions entirely obsolete,
> nor would it mean there would be no need for other specialized
> conversion functions.

I don't object to row_to_json() and array_to_json() functions being
there as a convenience and as the two "official" functions guaranteed to
return "JSON text".

> regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2012-05-02 10:20:39 Re: Analyzing foreign tables & memory problems
Previous Message Hannu Krosing 2012-05-02 08:52:11 Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?