Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-09 00:45:20
Message-ID: 29828.1407545120@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> What about considering how large the object is when we are analyzing if
> it compresses well overall?

Hmm, yeah, that's a possibility: we could redefine the limit at which
we bail out in terms of a fraction of the object size instead of a fixed
limit. However, that risks expending a large amount of work before we
bail, if we have a very large incompressible object --- which is not
exactly an unlikely case. Consider for example JPEG images stored as
bytea, which I believe I've heard of people doing. Another issue is
that it's not real clear that that fixes the problem for any fractional
size we'd want to use. In Larry's example of a jsonb value that fails
to compress, the header size is 940 bytes out of about 12K, so we'd be
needing to trial-compress about 10% of the object before we reach
compressible data --- and I doubt his example is worst-case.

>> 1. The real problem here is that jsonb is emitting quite a bit of
>> fundamentally-nonrepetitive data, even when the user-visible input is very
>> repetitive. That's a compression-unfriendly transformation by anyone's
>> measure.

> I disagree that another algorithm wouldn't be able to manage better on
> this data than pglz. pglz, from my experience, is notoriously bad a
> certain data sets which other algorithms are not as poorly impacted by.

Well, I used to be considered a compression expert, and I'm going to
disagree with you here. It's surely possible that other algorithms would
be able to get some traction where pglz fails to get any, but that doesn't
mean that presenting them with hard-to-compress data in the first place is
a good design decision. There is no scenario in which data like this is
going to be friendly to a general-purpose compression algorithm. It'd
be necessary to have explicit knowledge that the data consists of an
increasing series of four-byte integers to be able to do much with it.
And then such an assumption would break down once you got past the
header ...

> Perhaps another options would be a new storage type which basically says
> "just compress it, no matter what"? We'd be able to make that the
> default for jsonb columns too, no?

Meh. We could do that, but it would still require adding arguments to
toast_compress_datum() that aren't there now. In any case, this is a
band-aid solution; and as Josh notes, once we ship 9.4 we are going to
be stuck with jsonb's on-disk representation pretty much forever.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-08-09 01:22:22 Re: jsonb format is pessimal for toast compression
Previous Message Stephen Frost 2014-08-09 00:26:37 Re: jsonb format is pessimal for toast compression