Re: jsonb and nested hstore

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: jsonb and nested hstore
Date: 2014-02-25 14:46:34
Message-ID: CAHyXU0w11O5u=Y54_HjRrhY_GQbNX_k=Y9xLwoyf_q6-qOEsvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 24, 2014 at 1:15 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 02/24/2014 11:06 AM, Merlin Moncure wrote:
>>
>> On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>
>> wrote:
>>>
>>> On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>
>>> wrote:
>>>>
>>>> I still find the phrasing "as jsonb is more efficient for most
>>>> purposes" to be a bit off Basically, the text json type is faster for
>>>> serialization/deserialization pattern (not just document preservation)
>>>> and jsonb is preferred when storing json and doing repeated
>>>> subdocument accesses.
>>>
>>> Hm, I'm going to withdraw that. I had done some testing of simple
>>> deserialization (cast to text and the like) and noted that jsonb was
>>> as much as 5x slower. However, I just did some checking on
>>> json[b]_populate_recordset though and it's pretty much a wash.
>>
>> [sorry for noise on this].
>>
>> Here's the use case coverage as I see it today:
>>
>> CASE: json jsonb hstore
>> Static document: yes poor poor
>> Precise document: yes no no
>> Serialization: yes no no****
>> Deserialization: poor*** yes* no****
>> Repeated Access: poor yes yes
>> Manipulation: no no** yes
>> GIST/GIN searching: no no** yes
>>
>> notes:
>> * jsonb gets 'yes' for deserialization assuming andrew's 'two level'
>> deserialization fix goes in (otherwise 'poor').
>> ** jsonb can't do this today, but presumably will be able to soon
>> *** 'poor' unless json type also gets the deserialization fix, then 'yes'.
>> **** hstore can deserialize hstore format, but will rely on json/jsonb
>> for deserializing json
>>
>> 'Static document' represents edge cases where the json is opaque to
>> the database but performance -- for example large map polygons.
>> 'Precise document' represents cases where whitespace or key order is
>> important.
>>
>> Peter asked upthread how to access the various features. Well, today,
>> it basically means a bit of nimble casting to different structures
>> depending on which particular features are important to you, which
>> IMNSHO is not bad at all as long as we understand that most people who
>> rely on jsonb will also need hstore for its searching and operators.
>> Down the line when hstore and jsonb are more flushed out it's going to
>> come down to an API style choice.
>
> Frankly, a lot of the above doesn't make much sense to me. WTF is
> "Manipulation'?
>
> Unless I see much more actual info on the tests being conducted it's just
> about impossible to comment. The performance assessment at this stage is
> simply anecdotal as far as I'm concerned.

Er, I wasn't making performance assessments (except in cases where it
was obvious like poor support for arbitrary access with json) , but
API coverage of use cases. "Manipulation" I thought obvious: the
ability to manipulate the document (say, change some value to
something else): the nosql pattern. through the API. Neither json or
jsonb can do that at present...only hstore can. jsonb cant't; it only
covers some of what json type currently covers (but some of the thing
it does cover is much faster).

On Mon, Feb 24, 2014 at 11:31 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Hm, I'm going to withdraw that. I had done some testing of simple
>> deserialization (cast to text and the like) and noted that jsonb was
>> as much as 5x slower. However, I just did some checking on
>> json[b]_populate_recordset though and it's pretty much a wash.
>
> Aside from that, I want our docs to make a strong endorsement of using
> jsonb over json for most users. jsonb will continue to be developed and
> improved in the future; it is very unlikely that json will. Maybe
> that's what I should say rather than anything about efficiency.

I would hope that endorsement doesn't extend to misinforming users.
Moreover, json type is handling all serialization at present and will
continue to do so for some years. In fact, in this release we got a
bunch of new very necessary enhancements (json_build) to
serialization! You're trying to deprecate and enhance the type at the
same time!

The disconnect here is that your statements would be correct if the
only usage for the json type would be for storing data in json.
However, people (including myself) are doing lots of wonderful things
storing data in the traditional way and moving into and out of json in
queries and that, besides working better in the json type, is only
possible in json. That might change in the future by figuring out a
way to cover json serialization cases through jsonb but that's not how
things work today, end of story.

Look, I definitely feel the frustration and weariness here in terms of
my critiquing the proposed API along with the other arguments I've
made. Please understand that nobody wants this to go out the door
more than me if the objective is to lock in the API 'as is' then let's
be polite to our users and try to document various use cases and
what's good at what.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-02-25 14:51:41 Re: often PREPARE can generate high load (and sometimes minutes long unavailability)
Previous Message Atri Sharma 2014-02-25 14:33:44 Re: [pgsql-advocacy] GSoC 2014 - mentors, students and admins