Re: [Bad Attachment] Re: jsonb format is pessimal for toast compression

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: [Bad Attachment] Re: jsonb format is pessimal for toast compression
Date: 2014-08-15 22:18:30
Message-ID: 53EE8736.5090505@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/15/2014 01:38 PM, Tom Lane wrote:
> I've been poking at this, and I think the main explanation for your result
> is that with more JSONB documents being subject to compression, we're
> spending more time in pglz_decompress. There's no free lunch in that
> department: if you want compressed storage it's gonna cost ya to
> decompress. The only way I can get decompression and TOAST access to not
> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
> PLAIN. However, when I do that, I do see my test patch running about 25%
> slower overall than HEAD on an "explain analyze select jfield -> 'key'
> from table" type of query with 200-key documents with narrow fields (see
> attached perl script that generates the test data).

Ok, that probably falls under the heading of "acceptable tradeoffs" then.

> Having said all that, I think this test is something of a contrived worst
> case. More realistic cases are likely to have many fewer keys (so that
> speed of the binary search loop is less of an issue) or else to have total
> document sizes large enough that inline PLAIN storage isn't an option,
> meaning that detoast+decompression costs will dominate.

This was intended to be a worst case. However, I don't think that it's
the last time we'll see the case of having 100 to 200 keys each with
short values. That case was actually from some XML data which I'd
already converted into a regular table (hence every row having 183
keys), but if JSONB had been available when I started the project, I
might have chosen to store it as JSONB instead. It occurs to me that
the matching data from a personals website would very much fit the
pattern of having between 50 and 200 keys, each of which has a short value.

So we don't need to *optimize* for that case, but it also shouldn't be
disastrously slow or 300% of the size of comparable TEXT. Mind you, I
don't find +80% to be disastrously slow (especially not with a space
savings of 60%), so maybe that's good enough.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-08-15 22:38:39 Re: strncpy is not a safe version of strcpy
Previous Message Stefan Keller 2014-08-15 22:13:00 New Minmax index for geometry data type?