Re: Very Limited Toast Compression on JSONB (9.4 beta 2)

From: Larry White <ljw1001(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Very Limited Toast Compression on JSONB (9.4 beta 2)
Date: 2014-08-01 14:12:19
Message-ID: CAMdbzViu0DiRve8-ut7akYBfem8j6gfT3OKHYysq_gH2m2NMjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

There is no TOAST compression on JSON or JSONB data in 9.4 beta 2. I'm not
sure about other versions. I'm also not sure if this is a bug or by
design, but if it is by design, I think the documentation should be
updated.

Here is a summary of my results inserting 10,000 highly compressible JSON
docs of 251K each.

Column Type - Storage - TOAST table size
JSONB - EXTERNAL - 2448 MB
JSONB - EXTENDED - 2448 MB
JSON - EXTENDED - 2504 MB
TEXT - EXTERNAL - 2409 MB
TEXT - EXTENDED - 40 MB

On Fri, Aug 1, 2014 at 2:36 AM, Larry White <ljw1001(at)gmail(dot)com> wrote:

>
> On Fri, Aug 1, 2014 at 2:20 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> On Thursday, July 31, 2014, Larry White <ljw1001(at)gmail(dot)com> wrote:
>>
>>> Hi,
>>>
>>> I'm running an experiment on 9.4 beta 2.
>>>
>>> I put 275,000 identical JSON files into a table using JSONB (one per
>>> row). Each raw text file is 251K in size, so the total uncompressed is
>>> 69GB. The column storage is set to EXTENDED. There are other toastable
>>> columns in the table, but none have more than 36 bytes of data in them.
>>>
>>> My Toast table is 66GB. I would have expected to get that much (or more)
>>> compression just from JSONB being a binary format.
>>>
>>> If I compress one of these JSON files outside of Postgres, it goes from
>>> 251K to 1K.
>>>
>>
>> That is an astonishing amount of compression. Are you really compressing
>> one of the files in isolation, rather than co-compressing many and then
>> pro-rating the result?
>>
>
> Yes, I should have explained. These are generated JSON files for testing
> and there is massive repetition in them, which is why they compress so well
> outside of Postgres. (Basically there is a repeating array of the same
> string) I did compress just the one.
>
>
>>
>> Can you provide an example of the data, and the command line you used to
>> compress it?
>>
>
> Compressed on a Mac with the Compress UI option. Here's a brief sample
> from the file. You can see why it compresses so well:
> "{\"junk\":[\"124245etweetwet345gwtretwt43 qwrqwq qwre qw
> rsdflkas\",\"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
> wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrs
> sdfsd\",\"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas\",\"q4535
>
> and so on.
>
> So each file should theoretically fit on a single row in the toast
>>> table. In total, the amount well under a GB when compressed outside of PG.
>>>
>>> Any guesses as to why there is so little compression of this data or how
>>> I might remedy the situation?
>>>
>>
>> PostgreSQL's built in tuple compression is generally not very good. It
>> is good at compressing long strings of identical bytes, but not good at
>> compressing the type of thing you are likely to find in JSON (unless your
>> JSON had long strings of spaces to reflect indentation of deeply nested
>> structures, which JSON probably wouldn't do and which JSONB certainly
>> wouldn't). It was designed to be very fast and to be unencumbered with the
>> patent issues common at the time it was written. It was not designed to
>> give the best possible compression ratios.
>>
>> It also compresses each row independently. Most of the compression
>> opportunities in a column of JSON data would probably be between rows, when
>> the same keys show up and over and over again, not within a row. But it
>> can't capture those opportunities.
>>
>
> I'm not expecting miracles with real data, but as far as I can tell, there
> is zero compression happening. I'm wondering if it is disabled for JSONB
> for some reason.
>
>
>>
>> Cheers,
>>
>> Jeff
>>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Larry White 2014-08-01 14:17:02 Re: Re: Is it possible to create an index without keeping the indexed data in a column?
Previous Message Kynn Jones 2014-08-01 13:05:54 Re: How to implement a uniqueness constraint across multiple tables?