Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-26 03:10:09
Message-ID: 16663.1411701009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The "offsets-and-lengths" patch seems like the approach we ought to
> compare to my patch, but it looks pretty unfinished to me: AFAICS it
> includes logic to understand offsets sprinkled into a mostly-lengths
> array, but no logic that would actually *store* any such offsets,
> which means it's going to act just like my patch for performance
> purposes.

> In the interests of pushing this forward, I will work today on
> trying to finish and review Heikki's offsets-and-lengths patch
> so that we have something we can do performance testing on.
> I doubt that the performance testing will tell us anything we
> don't expect, but we should do it anyway.

I've now done that, and attached is what I think would be a committable
version. Having done this work, I no longer think that this approach
is significantly messier code-wise than the all-lengths version, and
it does have the merit of not degrading on very large objects/arrays.
So at the moment I'm leaning to this solution not the all-lengths one.

To get a sense of the compression effects of varying the stride distance,
I repeated the compression measurements I'd done on 14 August with Pavel's
geometry data (<24077(dot)1408052877(at)sss(dot)pgh(dot)pa(dot)us>). The upshot of that was

min max avg

external text representation 220 172685 880.3
JSON representation (compressed text) 224 78565 541.3
pg_column_size, JSONB HEAD repr. 225 82540 639.0
pg_column_size, all-lengths repr. 225 66794 531.1

Here's what I get with this patch and different stride distances:

JB_OFFSET_STRIDE = 8 225 68551 559.7
JB_OFFSET_STRIDE = 16 225 67601 552.3
JB_OFFSET_STRIDE = 32 225 67120 547.4
JB_OFFSET_STRIDE = 64 225 66886 546.9
JB_OFFSET_STRIDE = 128 225 66879 546.9
JB_OFFSET_STRIDE = 256 225 66846 546.8

So at least for that test data, 32 seems like the sweet spot.
We are giving up a couple percent of space in comparison to the
all-lengths version, but this is probably an acceptable tradeoff
for not degrading on very large arrays.

I've not done any speed testing.

regards, tom lane

Attachment Content-Type Size
jsonb-with-offsets-and-lengths-4.patch text/x-diff 43.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-09-26 04:40:19 Re: jsonb format is pessimal for toast compression
Previous Message Robert Haas 2014-09-26 02:44:49 Re: Replication identifiers, take 3