Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-24 18:19:58
Message-ID: 2554A880-9AAB-4822-B920-9C28C614FE97@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul24, 2011, at 05:14 , Robert Haas wrote:
> On Fri, Jul 22, 2011 at 10:36 PM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Interesting. This leads to a couple more questions:
>>
>> * Should the JSON data type (eventually) have an equality operator?
>
> +1.

+1.

>> * Should the JSON input function alphabetize object members by key?
>
> I think it would probably be better if it didn't. I'm wary of
> overcanonicalization. It can be useful to have things come back out
> in more or less the format you put them in.

The downside being that we'd then either need to canonicalize in
the equality operator, or live with either no equality operator or
a rather strange one.

Also, if we don't canonicalize now, we (or rather our users) are in
for some pain should we ever decide to store JSON values in some other
form than plain text. Because if we do that, we'd presumably want
to order the members in some predefined way (by their hash value,
for example).

So, from me "+1" for alphabetizing members.

>> If we canonicalize strings and numbers and alphabetize object members,
>> then our equality function is just texteq. The only stumbling block
>> is canonicalizing numbers. Fortunately, JSON's definition of a
>> "number" is its decimal syntax, so the algorithm is child's play:
>>
>> * Figure out the digits and exponent.
>> * If the exponent is greater than 20 or less than 6 (arbitrary), use
>> exponential notation.
>>
>> The problem is: 2.718282e-1000 won't equal 0 as may be expected. I
>> doubt this matters much.
>
> I don't think that 2.718282e-100 SHOULD equal 0.

I agree. As for your proposed algorithm, I suggest to instead use
exponential notation if it produces a shorter textual representation.
In other words, for values between -1 and 1, we'd switch to exponential
notation if there's more than 1 leading zero (to the right of the decimal
point, of course), and for values outside that range if there're more than
2 trailing zeros and no decimal point. All after redundant zeros and
decimal points are removed. So we'd store

0 as 0
1 as 1
0.1 as 0.1
0.01 as 0.01
0.001 as 1e-3
10 as 10
100 as 100
1000 as 1e3
1000.1 as 1000.1
1001 as 1001

>> If, in the future, we add the ability to manipulate large JSON trees
>> efficiently (e.g. by using an auxiliary table like TOAST does), we'll
>> probably want unique members, so enforcing them now may be prudent.
>
> I doubt you're going to want to reinvent TOAST, but I do think there
> are many advantages to forbidding duplicate keys. ISTM the question
> is whether to throw an error or just silently discard one of the k/v
> pairs. Keeping both should not be on the table, IMHO.

+1.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-07-24 18:42:20 Re: Problem with pg_upgrade's directory write check on Windows
Previous Message Stefan Kaltenbrunner 2011-07-24 17:53:25 Re: pgbench cpu overhead (was Re: lazy vxid locks, v1)