Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Arthur Silva <arthurprs(at)gmail(dot)com>, Larry White <ljw1001(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-26 14:51:39
Message-ID: 11068.1409064699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> On 08/16/2014 02:19 AM, Tom Lane wrote:
>> I think the realistic alternatives at this point are either to
>> switch to all-lengths as in my test patch, or to use the hybrid approach
>> of Heikki's test patch. ...
>> Personally I'd prefer to go to the all-lengths approach, but a large
>> part of that comes from a subjective assessment that the hybrid approach
>> is too messy. Others might well disagree.

> It's not too pretty, no. But it would be nice to not have to make a
> tradeoff between lookup speed and compressibility.

> Yet another idea is to store all lengths, but add an additional array of
> offsets to JsonbContainer. The array would contain the offset of, say,
> every 16th element. It would be very small compared to the lengths
> array, but would greatly speed up random access on a large array/object.

That does nothing to address my basic concern about the patch, which is
that it's too complicated and therefore bug-prone. Moreover, it'd lose
on-disk compatibility which is really the sole saving grace of the
proposal.

My feeling about it at this point is that the apparent speed gain from
using offsets is illusory: in practically all real-world cases where there
are enough keys or array elements for it to matter, costs associated with
compression (or rather failure to compress) will dominate any savings we
get from offset-assisted lookups. I agree that the evidence for this
opinion is pretty thin ... but the evidence against it is nonexistent.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-08-26 14:53:03 Re: Scaling shared buffer eviction
Previous Message Alvaro Herrera 2014-08-26 13:54:59 Re: replicating DROP commands across servers