Re: jsonb format is pessimal for toast compression

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-16 20:06:43
Message-ID: CAO_YK0X56vyf8Vb3reYSi9aC8sxSxbCZmF0LQCDz0ozjb=87bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 16, 2014 at 4:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >>> Well, I can only judge from the use cases I personally have, none of
> >>> which involve more than 100 keys at any level for most rows. So far
> >>> I've seen some people argue hypotetical use cases involving hundreds of
> >>> keys per level, but nobody who *actually* has such a use case.
> >>
> >> I already told you that I did, and that it was the only and only app I
> >> had written for JSONB.
> >
> > Ah, ok, I thought yours was a test case. Did you check how it performed
> > on the two patches at all? My tests with 185 keys didn't show any
> > difference, including for a "last key" case.
>
> No, I didn't test it. But I think Heikki's test results pretty much
> tell us everything there is to see here. This isn't really that
> complicated; I've read a few papers on index compression over the
> years and they seem to often use techniques that have the same general
> flavor as what Heikki did here, adding complexity in the data format
> to gain other advantages. So I don't think we should be put off.
>

I second this reasoning. Even if I ran a couple of very realistic test
cases that support all-lengths I do fell that the Hybrid aproach would be
better as it covers all bases. To put things in perspective Tom's latest
patch isn't much simpler either.

Since it would still be a breaking change we should consider changing the
layout to key-key-key-value-value-value as it seems to pay off.

> Basically, I think that if we make a decision to use Tom's patch
> rather than Heikki's patch, we're deciding that the initial decision,
> by the folks who wrote the original jsonb code, to make array access
> less than O(n) was misguided. While that could be true, I'd prefer to
> bet that those folks knew what they were doing. The only way reason
> we're even considering changing it is that the array of lengths
> doesn't compress well, and we've got an approach that fixes that
> problem while preserving the advantages of fast lookup. We should
> have a darn fine reason to say no to that approach, and "it didn't
> benefit my particular use case" is not it.
>
> In practice, I'm not very surprised that the impact doesn't seem too
> bad when you're running SQL queries from the client. There's so much
> other overhead, for de-TOASTing and client communication and even just
> planner and executor costs, that this gets lost in the noise. But
> think about a PL/pgsql procedure, say, where somebody might loop over
> all of the elements in array. If those operations go from O(1) to
> O(n), then the loop goes from O(n) to O(n^2). I will bet you a
> beverage of your choice that somebody will find that behavior within a
> year of release and be dismayed by it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-16 20:45:10 Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message Heikki Linnakangas 2014-09-16 19:59:07 Re: New to PostGre SQL asking about write-ahead-log (WAL)