Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Hannu Krosing'" <hannu(at)2ndQuadrant(dot)com>, "'Andrew Dunstan'" <andrew(at)dunslane(dot)net>
Cc: "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Joey Adams'" <joeyadams3(dot)14159(at)gmail(dot)com>, "'PavelStehule'" <pavel(dot)stehule(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 22:35:02
Message-ID: 00b101cd27ea$a75cf4d0$f616de70$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Hannu Krosing
> Sent: Tuesday, May 01, 2012 5:29 PM
>
> The reason I am whining now is that with minor adjustments in
> implementation it could all be made much more powerful (try cast to ::json
> for values before cast to ::text) and much more elegant thanks to
> PostgreSQL's built in casting.
>
> If we allowed json to hold any "JSON value" and tried ::json when generating
> json for compound types than we would be able to claim that PostgreSQL
> supports JSON everywhere, defaulting to representing officially unsupported
> types as strings, but allowing users to convert these to their preferred
> conventions.

I get that a JSON Text is always also a JSON Value but the reverse is not true. Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded value is a possible JSON Text - the most important property for purposes of data interchange.

>
> I'd also prefer to have default conversions already included for some of our
> sexier types, like intervals (just a two element array) and hstore (an object)
> etc.

Interval is not simply 2 values but also denotes whether the particular value is inclusive or exclusive; you would have to use an object unless you transmit in a text format and let the target perform the necessary interpretation of the string.

>
> Suddenly we would be the best match database for Web development and
> all things Ajax and also have a widely used built in and adjustable interchange
> format to outer world.
>
> > Second, RFC 4627 is absolutely clear: a valid JSON value can only be
> > an object or an array, so this thing about converting arbitrary datum
> > values to JSON is a fantasy.
>
> Probably a typo on your part - valid "JSON _text_" is object or array, valid
> "JSON value" can also be number, text, true, false and null
>
> What I am arguing for is interpreting our json type as representing a "JSON
> value" not "JSON text", this would enable users to adjust and extend the
> generation of json values via defining casts for their specific types - most
> notably Date* types but also things like hstore, which has a natural JSON
> representation as "object" (a list of key:value pairs for non-js users, a.k.a. a
> dictionary, hash, etc.)

Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case.

Currently, the idea is to get your result all lined up and ready to go and then ship it off to the caller as valid JSON so that the caller does not have to do so itself. Answering the question "what would this value look like if it was part of a json output?" is good; however, production use is likely to mostly care about the entire json interchange construct (i.e., JSON Text)

So: json -> json_text;

A JSON Value always has a textual representation but if we were to have an actual type it would make sense to encode it such that (strings, objects and arrays) are delimited while (numbers, false, true, and null) are not.

Type Name: json_value

Output Representations (all output surrounded by double-quotes since all are string-like) -
String: "'VALUE'" (single-quote delimiter)
Object: "{...}"
Array: "[]"
Number: "0.00"
Other: "false", "true", "null"

JSON is fundamentally an interchange format (especially from a database's perspective). JSON Values only really have meaning if they are attached explicitly to a JSON Text structure, if you wanted to store one independently you should convert it into a native representation first. The few exceptions to this would be sufficiently handled via plain text with meta-data indicating that the stored value is structured in directly JSON compatible syntax. In short, the default context for JSON in PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should reflect this default (which it does).

>
> > If anything, we should adjust the JSON input routines to disallow
> > anything else, rather than start to output what is not valid JSON.
>
> I tested python, ruby and javascript in firefox and chrome, all their JSON
> generators generate 1 for standalone integer 1 and "a" for standalone string
> a , and none refused to convert either to JSON.
>

^Assume that we keep the meaning of json to be JSON Text; what would you suggest occurs if someone attempts a datum -> json cast? Given that we are working in a strongly-typed environment the meaning of JSON cannot be changed and so either the cast has to output valid JSON Text or it has to fail. My personal take it is have it fail since any arbitrary decision to cast to JSON Text is going to make someone unhappy and supposedly they can modify their query so that the result generates whatever format they desire.

I haven't followed the JSON development in 9.2 too closely but exposing whatever conversion mechanism is currently used to generate JSON makes sense from a ease-of-development standpoint. But even then, during development passing around true JSON Text is not a big deal and then no "JSON_Value" API has to be exposed; thus it can be freely refined, along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3

So, in short, all of your ideas are still valid but use "json_value" for the data type. But, even them my guess is that you would rarely use json_value as a column type whereas you would frequently use json (JSON Text) for one. json_value would be a support type to facilitate working with json in a procedural-like way.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-05-01 23:07:19 Re: proposal: additional error fields
Previous Message Alexander Korotkov 2012-05-01 22:08:30 Re: Patch: add conversion from pg_wchar to multibyte