Re: jsonb format is pessimal for toast compression

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-08 15:35:59
Message-ID: 53E4EE5F.5090904@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109. The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
> That's not really the issue here, I think. The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.

It would certainly be worth validating that changing this would fix the
problem.

I don't know how invasive that would be - I suspect (without looking
very closely) not terribly much.

> 2. Are we going to ship 9.4 without fixing this? I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.
>
> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me. We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.
>
>

Yeah, it would be a bit painful, but after all finding out this sort of
thing is why we have betas.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-08-08 15:43:00 Defining a foreign key with a duplicate column is broken
Previous Message Robert Haas 2014-08-08 15:29:31 Re: replication commands and log_statements