Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Terry Laurenzo <tj(at)laurenzo(dot)org>, robertmhaas(at)gmail(dot)com
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-10-19 19:40:28
Message-ID: AANLkTikWiqbDSdAoDv73noBV9irYO6LyhCB_oSS95OVS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo <tj(at)laurenzo(dot)org> wrote:
> Perhaps we should enumerate the attributes of what would make a good binary
> encoding?

Not sure if we're discussing the internal storage format or the binary
send/recv format, but in my humble opinion, some attributes of a good
internal format are:

1. Lightweight - it'd be really nice for the JSON datatype to be
available in core (even if extra features like JSONPath aren't).
2. Efficiency - Retrieval and storage of JSON datums should be
efficient. The internal format should probably closely resemble the
binary send/recv format so there's a good reason to use it.

A good attribute of the binary send/recv format would be
compatibility. For instance, if MongoDB (which I know very little
about) has binary send/receive, perhaps the JSON data type's binary
send/receive should use it.

Efficient retrieval and update of values in a large JSON tree would be
cool, but would be rather complex, and IMHO, overkill. JSON's main
advantage is that it's sort of a least common denominator of the type
systems of many popular languages, making it easy to transfer
information between them. Having hierarchical key/value store support
would be pretty cool, but I don't think it's what PostgreSQL's JSON
data type should do.

On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think we should take a few steps back and ask why we think that
> binary encoding is the way to go. We store XML as text, for example,
> and I can't remember any complaints about that on -bugs or
> -performance, so why do we think JSON will be different? Binary
> encoding is a trade-off. A well-designed binary encoding should make
> it quicker to extract a small chunk of a large JSON object and return
> it; however, it will also make it slower to return the whole object
> (because you're adding serialization overhead). I haven't seen any
> analysis of which of those use cases is more important and why.

Speculation: the overhead involved with retrieving/sending and
receiving/storing JSON (not to mention TOAST
compression/decompression) will be far greater than that of
serializing/unserializing.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2010-10-19 19:50:55 Re: plan time of MASSIVE partitioning ...
Previous Message David E. Wheeler 2010-10-19 19:36:20 Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)