From: | Terri Laurenzo <tj(at)laurenzo(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, AndrewDunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP) |
Date: | 2010-10-20 01:15:52 |
Message-ID: | 7C954B81-026B-40D0-9E84-3467062B9532@laurenzo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I hear ya. It might be a premature optimization but I still think there may be benefit for the case of large scale extraction and in- database transformation of large JSON datastructures. We have terabytes of this stuff and I'd like something between the hip nosql options and a fully structured SQL datastore.
Terry
Sent from my iPhone
On Oct 19, 2010, at 6:36 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 19, 2010 at 6:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Greg Stark <gsstark(at)mit(dot)edu> writes:
>>> The elephant in the room is if the binary encoded form is smaller then
>>> it occupies less ram and disk bandwidth to copy it around.
>>
>> It seems equally likely that a binary-encoded form could be larger
>> than the text form (that's often true for our other datatypes).
>> Again, this is an argument that would require experimental evidence
>> to back it up.
>
> That's exactly what I was thinking when I read Greg's email. I
> designed something vaguely (very vaguely) like this many years ago and
> the binary format that I worked so hard to create was enormous
> compared to the text format, mostly because I had a lot of small
> integers in the data I was serializing, and as it turns out,
> representing {0,1,2} in less than 7 bytes is not very easy. It can
> certainly be done if you set out to optimize for precisely those kinds
> of cases, but I ended up with something awful like:
>
> <4 byte type = list> <4 byte list length = 3> <4 byte type = integer>
> <4 byte integer = 0> <4 byte type = integer> <4 byte integer = 1> <4
> byte type = integer> <4 byte integer = 2>
>
> = 32 bytes. Even if you were a little smarter than I was and used 2
> byte integers (with some escape hatch allowing larger numbers to be
> represented) it's still more than twice the size of the text
> representation. Even if you use 1 byte integers it's still bigger.
> To get it down to being smaller, you've got to do something like make
> the high nibble of each byte a type field and the low nibble the first
> 4 payload bits. You can certainly do all of this but you could also
> just store it as text and let the TOAST compression algorithm worry
> about making it smaller.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-20 01:17:43 | Re: Domains versus arrays versus typmods |
Previous Message | Andrew Dunstan | 2010-10-20 01:15:33 | Re: WIP: extensible enums |