Re: jsonb format is pessimal for toast compression

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Larry White <ljw1001(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-20 22:42:28
Message-ID: CAO_YK0Wmco2g_tOT-3ekcHSfwXLr_X6GKMBc_vxOP3TvRoiZrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

What data are you using right now Josh?

There's the github archive http://www.githubarchive.org/
Here's some sample data https://gist.github.com/igrigorik/2017462

--
Arthur Silva

On Wed, Aug 20, 2014 at 6:09 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> On 08/20/2014 08:29 AM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> On 08/15/2014 04:19 PM, Tom Lane wrote:
> >>> Personally I'd prefer to go to the all-lengths approach, but a large
> >>> part of that comes from a subjective assessment that the hybrid
> approach
> >>> is too messy. Others might well disagree.
> >
> >> ... So, that extraction test is about 1% *slower* than the basic Tom
> Lane
> >> lengths-only patch, and still 80% slower than original JSONB. And it's
> >> the same size as the lengths-only version.
> >
> > Since it's looking like this might be the direction we want to go, I took
> > the time to flesh out my proof-of-concept patch. The attached version
> > takes care of cosmetic issues (like fixing the comments), and includes
> > code to avoid O(N^2) penalties in findJsonbValueFromContainer and
> > JsonbIteratorNext. I'm not sure whether those changes will help
> > noticeably on Josh's test case; for me, they seemed worth making, but
> > they do not bring the code back to full speed parity with the all-offsets
> > version. But as we've been discussing, it seems likely that those costs
> > would be swamped by compression and I/O considerations in most scenarios
> > with large documents; and of course for small documents it hardly
> matters.
>
> Table sizes and extraction times are unchanged from the prior patch
> based on my workload.
>
> We should be comparing all-lengths vs length-and-offset maybe using
> another workload as well ...
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-20 22:51:34 Re: Minmax indexes
Previous Message Baker, Keith [OCDUS Non-J&J] 2014-08-20 21:21:41 Re: Proposal to add a QNX 6.5 port to PostgreSQL