Re: jsonb format is pessimal for toast compression

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-09 01:44:32
Message-ID: 20140809014432.GW16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > I agree that we need to avoid changing jsonb's on-disk representation.
>
> ... post-release, I assume you mean.

Yes.

> > Have I missed where a good suggestion has been made about how to do that
> > which preserves the binary-search capabilities and doesn't make the code
> > much more difficult?
>
> We don't have one yet, but we've only been thinking about this for a few
> hours.

Fair enough.

> > Trying to move the header to the end just for the
> > sake of this doesn't strike me as a good solution as it'll make things
> > quite a bit more complicated. Is there a way we could interleave the
> > likely-compressible user data in with the header instead?
>
> Yeah, I was wondering about that too, but I don't immediately see how to
> do it without some sort of preprocessing step when we read the object
> (which'd be morally equivalent to converting a series of lengths into a
> pointer array). Binary search isn't going to work if the items it's
> searching in aren't all the same size.
>
> Having said that, I am not sure that a preprocessing step is a
> deal-breaker. It'd be O(N), but with a pretty darn small constant factor,
> and for plausible sizes of objects I think the binary search might still
> dominate. Worth investigation perhaps.

For my part, I'm less concerned about a preprocessing step which happens
when we store the data and more concerned about ensuring that we're able
to extract data quickly. Perhaps that's simply because I'm used to
writes being more expensive than reads, but I'm not alone in that
regard either. I doubt I'll have time in the next couple of weeks to
look into this and if we're going to want this change for 9.4, we really
need someone working on it sooner than later. (to the crowd)- do we
have any takers for this investigation?

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-08-09 01:44:42 9.4 pg_restore --help changes
Previous Message Stephen Frost 2014-08-09 01:34:27 Re: Hokey wrong versions of libpq in apt.postgresql.org