Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: 'Andrew Dunstan' <andrew(at)dunslane(dot)net>, 'Merlin Moncure' <mmoncure(at)gmail(dot)com>, 'Joey Adams' <joeyadams3(dot)14159(at)gmail(dot)com>, 'PavelStehule' <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 00:41:14
Message-ID: 1335919274.3106.343.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2012-05-01 at 18:35 -0400, David Johnston wrote:
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> > owner(at)postgresql(dot)org] On Behalf Of Hannu Krosing
> > Sent: Tuesday, May 01, 2012 5:29 PM
> >
> > The reason I am whining now is that with minor adjustments in
> > implementation it could all be made much more powerful (try cast to ::json
> > for values before cast to ::text) and much more elegant thanks to
> > PostgreSQL's built in casting.
> >
> > If we allowed json to hold any "JSON value" and tried ::json when generating
> > json for compound types than we would be able to claim that PostgreSQL
> > supports JSON everywhere, defaulting to representing officially unsupported
> > types as strings, but allowing users to convert these to their preferred
> > conventions.
>
> I get that a JSON Text is always also a JSON Value but the reverse is not true.
> Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded
> value is a possible JSON Text - the most important property for purposes of
> data interchange.

Nope, the most important property for purposes of data interchange is
that we produce something that the client expects and can understand
without too much extra work on client side.

The way to "guarantee" JSON Text is to encode objects that produce it.

I see nothing wrong with returning either a complex JSON object of
simply null if the object could not be found.

> > I'd also prefer to have default conversions already included for some of our
> > sexier types, like intervals (just a two element array) and hstore (an object)
> > etc.
>
> Interval is not simply 2 values but also denotes whether the particular value
> is inclusive or exclusive; you would have to use an object unless you transmit
> in a text format and let the target perform the necessary interpretation of
> the string.

if you need that info to be passed to _your_ caller you just define a
ned cast for youtr interval-to-json which returns "object" notation.

It was meant as a sample of what could be included by having generic
json values and using json casts.

> > Suddenly we would be the best match database for Web development and
> > all things Ajax and also have a widely used built in and adjustable interchange
> > format to outer world.
> >
> > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be
> > > an object or an array, so this thing about converting arbitrary datum
> > > values to JSON is a fantasy.
> >
> > Probably a typo on your part - valid "JSON _text_" is object or array, valid
> > "JSON value" can also be number, text, true, false and null
> >
> > What I am arguing for is interpreting our json type as representing a "JSON
> > value" not "JSON text", this would enable users to adjust and extend the
> > generation of json values via defining casts for their specific types - most
> > notably Date* types but also things like hstore, which has a natural JSON
> > representation as "object" (a list of key:value pairs for non-js users, a.k.a. a
> > dictionary, hash, etc.)
>
> Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case.
>
> Currently, the idea is to get your result all lined up and ready to go and
> then ship it off to the caller as valid JSON so that the caller does not
> have to do so itself.

"Valid JSON" is only a small part of the equation, as I quoted before
from "JSON: The Fat-Free Alternative to XML" at
http://www.json.org/fatfree.html

"JSON has no validator. Being well-formed and valid is not the same as
being correct and relevant. Ultimately, every application is responsible
for validating its inputs."

If we produce correct JSON text for things that are converible to JSON
text then we should be free to produce JSON values for simple value,
like everybody else (that is Javascript , python, ruby, ...)

I don't think it is postgreSQL's business to start educating people
about "correct" way to do JSON serialisation when everybody else does it
the generic way.

> Answering the question "what would this value look like if it was part of
> a json output?" is good; however, production use is likely to mostly care
> about the entire json interchange construct (i.e., JSON Text)

the "what would it look like" part is important for values that are not
covered by standard and are thus encoded as text. These need to follow
conventions outside the JSON spec proper, and thus may need to be
adjusted by the developer.

Doing it via ::json casts would be the cleanest and simplest way to deal
with it.

> So: json -> json_text;
>
> A JSON Value always has a textual representation but if we were to have an
> actual type it would make sense to encode it such that (strings, objects
> and arrays) are delimited while (numbers, false, true, and null) are not.

And so it is, what are you trying to say here ?

> Type Name: json_value
>
> Output Representations (all output surrounded by double-quotes since all are string-like) -
> String: "'VALUE'" (single-quote delimiter)

strings are surrounded by "" not ''

> Object: "{...}"
> Array: "[]"
> Number: "0.00"

"0.00" is a sting according to JSON spec

also, objects and arrays don't have surrounding ""

> Other: "false", "true", "null"

that is not what the standard says - "false" is not the same as false
without quotes - the first is string 'false', the second one is boolean
untrue

> JSON is fundamentally an interchange format (especially from a database's perspective).
> JSON Values only really have meaning if they are attached explicitly to a JSON Text
> structure, if you wanted to store one independently you should convert it into a
> native representation first. The few exceptions to this would be sufficiently
> handled via plain text with meta-data indicating that the stored value is structured
> in directly JSON compatible syntax.

This is something that each developer could define for his specific app
by writing an appropriate ::json cast function for the datatype, if such
casts were honoured while doing the eoutput conversion.

There is no one standard for mosts "outside jsons scope" datatypes,

> In short, the default context for JSON in
> PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should
> reflect this default (which it does).

I don't think postgreSQL has hierarchical types, so that json_text would
also be json_value, but not the other way.

I'm perfectly happy with us being flexible enough to allow people to
also generate non-standard JSON - as some of them most likely are now -
if it makes the whole system cleaner and easier to use.

I don't see why PostgreSQL's JSON should be more restrictive than most
others.

> > > If anything, we should adjust the JSON input routines to disallow
> > > anything else, rather than start to output what is not valid JSON.
> >
> > I tested python, ruby and javascript in firefox and chrome, all their JSON
> > generators generate 1 for standalone integer 1 and "a" for standalone string
> > a , and none refused to convert either to JSON.
> >
>
> Assume that we keep the meaning of json to be JSON Text; what would you
> suggest occurs if someone attempts a datum -> json cast?

I would not assume such thing ;)

> Given that we are working in a strongly-typed environment the meaning of
> JSON cannot be changed and so either the cast has to output valid JSON Text
> or it has to fail.

Most people don't work in strongly-typed environment, and thus would
work around such restriction if they need a simple JSON value at the
other end of the interchange.

They would either do like I did and use array_to_json(array[datum]) and
then strip off the [] before shipping the JSON in the most likely case
that receiver side _wants_ to get JSON value and has been getting a JSON
value from other partners, or it may decode it as array and then use the
only element.

> My personal take it is have it fail since any arbitrary decision to cast
> to JSON Text is going to make someone unhappy and supposedly they can
> modify their query so that the result generates whatever format they desire.

Do you actually have such an experience or is it just a wild guess ?

Would the one being "unhappy" be some developer who is actually using
JSON, or somebody whoi has only academic interest in RFCs ?

> I haven't followed the JSON development in 9.2 too closely but exposing
> whatever conversion mechanism is currently used to generate JSON makes
> sense from a ease-of-development standpoint. But even then, during
> development passing around true JSON Text is not a big deal and then
> no "JSON_Value" API has to be exposed; thus it can be freely refined,

The use of casts to json would be a very natural way to make all
postgreSQL type exportable to correct JSON and would give the control
over the exact conversion details to the developer.

Currently we blindly use for json value whatever the text format of the
field happens to be with no choice other than either do the whole
conversion ourselves or then accept that some datums have unsuitable
formats and need to do some ugly and possibly error-prone conversions on
the client.

that way madness lies - I'm pretty sure I don't want to do any of that.

I want my database do the conversion for standard tyoes and give me a
choice to change the conversion for _only_ the types I need,

Using type specific casts to json give me exactly this.

> along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3
>
> So, in short, all of your ideas are still valid but use "json_value"
> for the data type. But, even them my guess is that you would rarely
> use json_value as a column type whereas you would frequently use json
> (JSON Text) for one.

> json_value would be a support type to facilitate working with json in
> a procedural-like way.

If you are really paranoid about somebody returning json value out of
postgresql why not just use a filter function which would fail if the
argument is not an array or text

define fail_if_not_json_text(json) returns json

this gives the developers the choice to still return JSON Values to
clients if they need.

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-05-02 00:53:27 Have we out-grown Flex?
Previous Message David Johnston 2012-05-02 00:32:58 Re: proposal: additional error fields