Re: jsonb format is pessimal for toast compression

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Larry White <ljw1001(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-15 21:20:39
Message-ID: CAO_YK0VZRG6yjBgyrfQ7Rz=K7j-xq6HYcfVQZcTO3WNhXHCwSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm still getting up to speed on postgres development but I'd like to leave
an opinion.

We should add some sort of versionning to the jsonb format. This can be
explored in the future in many ways.

As for the current problem, we should explore the directory at the end
option. It should improve compression and keep good access performance.

A 4 byte header is sufficient to store the directory offset and some
versionning bits.
Em 15/08/2014 17:39, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> escreveu:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > On 08/14/2014 07:24 PM, Tom Lane wrote:
> >> We can certainly reduce that. The question was whether it would be
> >> worth the effort to try. At this point, with three different test
> >> data sets having shown clear space savings, I think it is worth
> >> the effort. I'll poke into it tomorrow or over the weekend, unless
> >> somebody beats me to it.
>
> > Note that I specifically created that data set to be a worst case: many
> > top-level keys, no nesting, and small values. However, I don't think
> > it's an unrealistic worst case.
>
> > Interestingly, even on the unpatched, 1GB table case, the *index* on the
> > JSONB is only 60MB. Which shows just how terrific the improvement in
> > GIN index size/performance is.
>
> I've been poking at this, and I think the main explanation for your result
> is that with more JSONB documents being subject to compression, we're
> spending more time in pglz_decompress. There's no free lunch in that
> department: if you want compressed storage it's gonna cost ya to
> decompress. The only way I can get decompression and TOAST access to not
> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
> PLAIN. However, when I do that, I do see my test patch running about 25%
> slower overall than HEAD on an "explain analyze select jfield -> 'key'
> from table" type of query with 200-key documents with narrow fields (see
> attached perl script that generates the test data).
>
> It seems difficult to improve much on that for this test case. I put some
> logic into findJsonbValueFromContainer to calculate the offset sums just
> once not once per binary-search iteration, but that only improved matters
> 5% at best. I still think it'd be worth modifying the JsonbIterator code
> to avoid repetitive offset calculations, but that's not too relevant to
> this test case.
>
> Having said all that, I think this test is something of a contrived worst
> case. More realistic cases are likely to have many fewer keys (so that
> speed of the binary search loop is less of an issue) or else to have total
> document sizes large enough that inline PLAIN storage isn't an option,
> meaning that detoast+decompression costs will dominate.
>
> regards, tom lane
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message worthy7 2014-08-15 21:59:19 GIST create index very very slow
Previous Message Tom Lane 2014-08-15 20:57:23 Re: Supporting Windows SChannel as OpenSSL replacement