Re: Add jsonb_compact(...) for whitespace-free jsonb to text

From: Sehrope Sarkuni <sehrope(at)jackdb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add jsonb_compact(...) for whitespace-free jsonb to text
Date: 2016-04-25 23:25:14
Message-ID: CAH7T-ap83PwQjtUzFUPBUa6Qd6SvzRegOjMx5GTTuFuEZBWZJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 24, 2016 at 9:27 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
> > On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
> > >AFAIK, there's also no guarantee on the specific order of the
> > >resulting properties in the text representation either. I would
> > >suppose it's fixed for a given jsonb value within a database major
> > >version but across major versions it could change (if the
> > >underlying representation changes).
> >
> > The order is fixed and very unlikely to change, as it was chosen
> > quite deliberately to help ensure efficient processing. Any change
> > in on-disk representation of data types is something we work very
> > hard to avoid, as it makes it impossible to run pg_upgrade.
>
> We do, from time-to-time, change on-disk formats in a
> backwards-compatible way though. In any case, it's my understanding
> that we don't *guarantee* any ordering currently and therefore we should
> discourage users from depending on it. If we *are* going to guarantee
> ordering, then we should document what that ordering is.
>

Yes that's the idea, namely to have a fixed text format that will not
change across releases. If the on-disk representation is already supports
that then this could just be a doc change (assuming there's agreement that
it's a good idea and said guarantee will be maintained).

Separately, I think the compact (i.e. whitespace free) output is useful on
it's own. It adds up to two bytes per key/value pair (one after the colon
and one after the comma) so the more keys you have the more the savings.

Here's a (contrived) example to show the size difference when serializing
information_schema.columns. The row_to_json(...) function returns
whitespace free output (as json, not jsonb) so it's a proxy for
json_compact(..). It comes out to 7.5% smaller than the default jsonb text
format:

app=> SELECT
MAX((SELECT COUNT(*) FROM json_object_keys(x))) AS num_keys,
AVG(length(x::text)) AS json_text,
AVG(length(x::jsonb::text)) AS jsonb_text,
AVG(length(x::text)) / AVG(length(x::jsonb::text)) AS ratio
FROM (SELECT row_to_json(z.*) AS x
FROM information_schema.columns z) t;
num_keys | json_text | jsonb_text | ratio

----------+-----------------------+-----------------------+------------------------
44 | 1077.0748522652659225 | 1164.0748522652659225 |
0.92526253803121012857
(1 row)

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-25 23:57:58 Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Previous Message Stephen Frost 2016-04-25 23:22:52 Re: Is there a way around function search_path killing SQL function inlining?