Re: jsonb format is pessimal for toast compression

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <jan(at)wi3ck(dot)info>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Claudio Freire" <klaussfreire(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-24 10:40:34
Message-ID: 54229FA2.90909@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/24/2014 08:16 AM, Tom Lane wrote:
> Jan Wieck <jan(at)wi3ck(dot)info> writes:
>> On 09/15/2014 09:46 PM, Craig Ringer wrote:
>>> Anyway - this is looking like the change will go in, and with it a
>>> catversion bump. Introduction of a jsonb version/flags byte might be
>>> worthwhile at the same time. It seems likely that there'll be more room
>>> for improvement in jsonb, possibly even down to using different formats
>>> for different data.
>>>
>>> Is it worth paying a byte per value to save on possible upgrade pain?
>
>> If there indeed has to be a catversion bump in the process of this, then
>> I agree with Craig.
>
> FWIW, I don't really. To begin with, it wouldn't be a byte per value,
> it'd be four bytes, because we need word-alignment of the jsonb contents
> so there's noplace to squeeze in an ID byte for free. Secondly, as I
> wrote in <15378(dot)1408548595(at)sss(dot)pgh(dot)pa(dot)us>:
>
> : There remains the
> : question of whether to take this opportunity to add a version ID to the
> : binary format. I'm not as excited about that idea as I originally was;
> : having now studied the code more carefully, I think that any expansion
> : would likely happen by adding more type codes and/or commandeering the
> : currently-unused high-order bit of JEntrys. We don't need a version ID
> : in the header for that. Moreover, if we did have such an ID, it would be
> : notationally painful to get it to most of the places that might need it.
>
> Heikki's patch would eat up the high-order JEntry bits, but the other
> points remain.

If we don't need to be backwards-compatible with the 9.4beta on-disk
format, we don't necessarily need to eat the high-order JEntry bit. You
can just assume that that every nth element is stored as an offset, and
the rest as lengths. Although it would be nice to have the flag for it
explicitly.

There are also a few free bits in the JsonbContainer header that can be
used as a version ID in the future. So I don't think we need to change
the format to add an explicit version ID field.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2014-09-24 10:50:00 Re: pg_dump bug in 9.4beta2 and HEAD
Previous Message Heikki Linnakangas 2014-09-24 10:28:06 Re: add modulo (%) operator to pgbench