Re: jsonb format is pessimal for toast compression

From: Jan Wieck <jan(at)wi3ck(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-05 02:26:56
Message-ID: 54091F70.8030402@wi3ck.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109. The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Ouch.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
>
> That's not really the issue here, I think. The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.

This is only because the input data was exact copies of the same strings
over and over again. PGLZ can very well compress slightly less identical
strings of varying lengths too. Not as well, but well enough. But I
suspect such input data would make it fail again, even with lengths.

Regards,
Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-09-05 03:10:32 Re: pg_receivexlog --status-interval add fsync feedback
Previous Message Jan Wieck 2014-09-05 02:24:11 Re: jsonb format is pessimal for toast compression