Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-23 02:36:44
Message-ID: CAARyMpBe8OonqjcmNeEhJLZ6Kf-Ljy_mbEdHtw4K4b=qXtxZ9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 22, 2011 at 7:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Hmm.  That's tricky.  I lean mildly toward throwing an error as being
> more consistent with the general PG philosophy.

I agree. Besides, throwing an error on duplicate keys seems like the
most logical thing to do. The most compelling reason not to, I think,
is that it would make the input function a little slower.

On Fri, Jul 22, 2011 at 8:26 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> * The XML type doesn't have this restriction (it just stores the
>> input text verbatim, and converts it to UTF-8 before doing anything
>> complicated with it).
>
> Yeah. But the price the XML type pays for that is the lack of an
> equality operator.

Interesting. This leads to a couple more questions:

* Should the JSON data type (eventually) have an equality operator?
* Should the JSON input function alphabetize object members by key?

If we canonicalize strings and numbers and alphabetize object members,
then our equality function is just texteq. The only stumbling block
is canonicalizing numbers. Fortunately, JSON's definition of a
"number" is its decimal syntax, so the algorithm is child's play:

* Figure out the digits and exponent.
* If the exponent is greater than 20 or less than 6 (arbitrary), use
exponential notation.

The problem is: 2.718282e-1000 won't equal 0 as may be expected. I
doubt this matters much.

It would be nice to canonicalize JSON on input, and that's the way I'd
like to go, but two caveats are:

* Input (and other operations) would require more CPU time. Instead
of being able to pass the data through a quick condense function, it'd
have to construct an AST (to sort object members) and re-encode the
JSON back into a string.
* Users, for aesthetic reasons, might not want their JSON members rearranged.

If, in the future, we add the ability to manipulate large JSON trees
efficiently (e.g. by using an auxiliary table like TOAST does), we'll
probably want unique members, so enforcing them now may be prudent.

- Joey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2011-07-23 02:44:30 Re: psql: display of object comments
Previous Message Greg Smith 2011-07-23 02:15:08 Re: pgbench --unlogged-tables