From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com> |
Subject: | Re: jsonb format is pessimal for toast compression |
Date: | 2014-08-08 06:27:51 |
Message-ID: | CAFjFpRfRpEKKUWNaUYxxQfPSjpFC2OM8pH1z7-6=H3-0O=jNzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > I looked into the issue reported in bug #11109. The problem appears to
> be
> > that jsonb's on-disk format is designed in such a way that the leading
> > portion of any JSON array or object will be fairly incompressible,
> because
> > it consists mostly of a strictly-increasing series of integer offsets.
> > This interacts poorly with the code in pglz_compress() that gives up if
> > it's found nothing compressible in the first first_success_by bytes of a
> > value-to-be-compressed. (first_success_by is 1024 in the default set of
> > compression parameters.)
>
> I haven't looked at this in any detail, so take this with a grain of
> salt, but what about teaching pglz_compress about using an offset
> farther into the data, if the incoming data is quite a bit larger than
> 1k? This is just a test to see if it's worthwhile to keep going, no? I
> wonder if this might even be able to be provided as a type-specific
> option, to avoid changing the behavior for types other than jsonb in
> this regard.
>
>
+1 for offset. Or sample the data in the beginning, middle and end.
Obviously one could always come up with worst case, but.
> (I'm imaginging a boolean saying "pick a random sample", or perhaps a
> function which can be called that'll return "here's where you wanna test
> if this thing is gonna compress at all")
>
> I'm rather disinclined to change the on-disk format because of this
> specific test, that feels a bit like the tail wagging the dog to me,
> especially as I do hope that some day we'll figure out a way to use a
> better compression algorithm than pglz.
>
> Thanks,
>
> Stephen
>
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Benedikt Grundmann | 2014-08-08 06:57:23 | Re: Proposal: Incremental Backup |
Previous Message | Ashutosh Bapat | 2014-08-08 06:23:13 | Re: Introducing coarse grain parallelism by postgres_fdw. |