Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bernd Helmle <mailings(at)oopsware(dot)de>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-24 22:48:12
Message-ID: CAARyMpCD4oDeLd5OtuPu5KZo2+C3qh-3Vs92N670NSW2Xf66Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 23, 2011 at 11:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I doubt you're going to want to reinvent TOAST, ...

I was thinking about making it efficient to access or update
foo.a.b.c.d[1000] in a huge JSON tree. Simply TOASTing the varlena
text means we have to unpack the entire datum to access and update
individual members. An alternative would be to split the JSON into
chunks (possibly by using the pg_toast_<id> table) and have some sort
of index that can be used to efficiently look up values by path.

This would not be trivial, and I don't plan to implement it any time soon.

>
On Sun, Jul 24, 2011 at 2:19 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jul24, 2011, at 05:14 , Robert Haas wrote:
>> On Fri, Jul 22, 2011 at 10:36 PM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> ... Fortunately, JSON's definition of a
>>> "number" is its decimal syntax, so the algorithm is child's play:
>>>
>>>  * Figure out the digits and exponent.
>>>  * If the exponent is greater than 20 or less than 6 (arbitrary), use
>>> exponential notation.
>>>
>>
>
> I agree. As for your proposed algorithm, I suggest to instead use
> exponential notation if it produces a shorter textual representation.
> In other words, for values between -1 and 1, we'd switch to exponential
> notation if there's more than 1 leading zero (to the right of the decimal
> point, of course), and for values outside that range if there're more than
> 2 trailing zeros and no decimal point. All after redundant zeros and
> decimal points are removed. So we'd store
>
> 0 as 0
> 1 as 1
> 0.1 as 0.1
> 0.01 as 0.01
> 0.001 as 1e-3
> 10 as 10
> 100 as 100
> 1000 as 1e3
> 1000.1 as 1000.1
> 1001 as 1001
>

Interesting idea. The reason I suggested using exponential notation
only for extreme exponents (less than -6 or greater than +20) is
partly for presentation value. Users might be annoyed to see 1000000
turned into 1e6. Moreover, applications working solely with integers
that don't expect the floating point syntax may choke on the converted
numbers. 32-bit integers can be losslessly encoded as IEEE
double-precision floats (JavaScript's internal representation), and
JavaScript's algorithm for converting a number to a string ([1],
section 9.8.1) happens to preserve the integer syntax (I think).

Should we follow the JavaScript standard for rendering numbers (which
my suggestion approximates)? Or should we use the shortest encoding
as Florian suggests?

- Joey

[1]: http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%205th%20edition%20December%202009.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-07-24 23:46:31 python cleanup
Previous Message Bruce Momjian 2011-07-24 21:27:34 Re: Problem with pg_upgrade's directory write check on Windows