Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Arthur Silva <arthurprs(at)gmail(dot)com>, Larry White <ljw1001(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-20 15:29:55
Message-ID: 15378.1408548595@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 08/15/2014 04:19 PM, Tom Lane wrote:
>> Personally I'd prefer to go to the all-lengths approach, but a large
>> part of that comes from a subjective assessment that the hybrid approach
>> is too messy. Others might well disagree.

> ... So, that extraction test is about 1% *slower* than the basic Tom Lane
> lengths-only patch, and still 80% slower than original JSONB. And it's
> the same size as the lengths-only version.

Since it's looking like this might be the direction we want to go, I took
the time to flesh out my proof-of-concept patch. The attached version
takes care of cosmetic issues (like fixing the comments), and includes
code to avoid O(N^2) penalties in findJsonbValueFromContainer and
JsonbIteratorNext. I'm not sure whether those changes will help
noticeably on Josh's test case; for me, they seemed worth making, but
they do not bring the code back to full speed parity with the all-offsets
version. But as we've been discussing, it seems likely that those costs
would be swamped by compression and I/O considerations in most scenarios
with large documents; and of course for small documents it hardly matters.

Even if we don't go this way, there are parts of this patch that would
need to get committed. I found for instance that convertJsonbArray and
convertJsonbObject have insufficient defenses against overflowing the
overall length field for the array or object.

For my own part, I'm satisfied with the patch as attached (modulo the
need to teach pg_upgrade about the incompatibility). There remains the
question of whether to take this opportunity to add a version ID to the
binary format. I'm not as excited about that idea as I originally was;
having now studied the code more carefully, I think that any expansion
would likely happen by adding more type codes and/or commandeering the
currently-unused high-order bit of JEntrys. We don't need a version ID
in the header for that. Moreover, if we did have such an ID, it would be
notationally painful to get it to most of the places that might need it.

regards, tom lane

Attachment Content-Type Size
jsonb-all-lengths.patch text/x-diff 29.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2014-08-20 15:35:18 Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Previous Message Tom Lane 2014-08-20 14:36:40 Re: [patch] pg_copy - a command for reliable WAL archiving