Re: jsonb format is pessimal for toast compression

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-12 01:56:04
Message-ID: CAO_YK0VzGrtndbQiUL67-aRxD0t2-YsC9KREgZmJCOJAz6AKhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 11, 2014 at 10:01 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> So, I finally got time to test Tom's latest patch on this.
>
> TLDR: we want to go with Tom's latest patch and release beta3.
>
> Figures:
>
> So I tested HEAD against the latest lengths patch. Per Arthur Silva, I
> checked uncompressed times for JSONB against compressed times. This
> changed the picture considerably.
>
> TABLE SIZES
> -----------
>
> HEAD
>
> ?column? | pg_size_pretty
> ---------------------+----------------
> json text format | 393 MB
> jsonb: compressed | 1147 MB
> jsonb: uncompressed | 1221 MB
>
> PATCHED
>
> ?column? | pg_size_pretty
> ---------------------+----------------
> json text format | 394 MB
> jsonb: compressed | 525 MB
> jsonb: uncompressed | 1200 MB
>
>
> EXTRACTION TIMES
> ----------------
>
> HEAD
>
> Q1 (search via GIN index followed by extracting 100,000 values from rows):
>
> jsonb compressed: 4000
> jsonb uncompressed: 3250
>
>
> Q2 (seq scan and extract 200,000 values from rows):
>
> json: 11700
> jsonb compressed: 3150
> jsonb uncompressed: 2700
>
>
> PATCHED
>
> Q1:
>
> jsonb compressed: 6750
> jsonb uncompressed: 3350
>
> Q2:
>
> json: 11796
> jsonb compressed: 4700
> jsonb uncompressed: 2650
>
> ----------------------
>
> Conclusion: with Tom's patch, compressed JSONB is 55% smaller when
> compressed (EXTENDED). Extraction times are 50% to 70% slower, but this
> appears to be almost entirely due to decompression overhead. When not
> compressing (EXTERNAL), extraction times for patch versions are
> statistically the same as HEAD, and file sizes are similar to HEAD.
>
> USER REACTION
> -------------
>
> I polled at both PDXpgDay and at FOSS4G, asking some ~~ 80 Postgres
> users how they would feel about a compression vs. extraction time
> tradeoff. The audience was evenly split.
>
> However, with the current patch, the user can choose. Users who know
> enough for performance tuning can set JSONB columns to EXTERNAL, and the
> the same performance as the unpatched version.
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

The compression ratio difference is exaggerated in this case but it does
support that Tom's patch alleviates the extraction penalty.

In my testings with the github archive data the savings <->
performance-penalty was fine, but I'm not confident in those results since
there were only 8 top level keys.
For comparison, some twitter api objects[1] have 30+ top level keys. If I
have time in the next couple of days I'll conduct some testings with the
public twitter fire-hose data.

[1] https://dev.twitter.com/rest/reference/get/statuses/home_timeline

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-09-12 02:56:21 Re: SKIP LOCKED DATA (work in progress)
Previous Message Peter Eisentraut 2014-09-12 01:44:39 Re: vacuumdb --all --analyze-in-stages - wrong order?