Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 07:40:45
Message-ID: CAM3SWZS-12EHfqvDzTbtkEsQHfVLJxx8XP1=jrCXK2wh9uKW-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> As the code stands, you don't have a choice on any of those things. The
> decisions have been made by us, PostgreSQL developers. The only choice you
> have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
> tradeoffs in both. Sure, they're still useful, if not optimal, for a
> wide-range of applications. For more complicated cases, you will have to
> resort to expression indexes. It bugs me greatly that the underlying indexam
> could do all those things, we're just not exposing the capability.

Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested "tags" array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd "resort" to expression indexes, like
they're a kludge. Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?

> ISTM we need a way to parameterize opclasses, so that when you create the
> index, you specify the above things.

That would be nice.

> In the absence of parameterizable opclasses, it would be much more flexible
> to have opclasses that index, keys, values, key-value pairs and paths
> separately, instead of the current json_ops and json_hash_ops opclasses
> which index all of those in the same index. That way, if you only e.g. ever
> query on the existence of a key, you'd only need to index the keys.

I think only ever needing to index the keys is not a common use-case.
It pretty much makes exactly as much sense to do so as it would with
hstore, and yet hstore doesn't support that after all these years.

> I don't understand how we ended up with the current dichotomy of json_ops
> and json_hash_ops...

It makes sense if you consider jsonb_ops best suited to simpler
hstore-style indexing, while jsonb_hash_ops is best suited to testing
containment of JSON documents, potentially with lots of nesting. These
documents are typically homogeneous in structure. Idiomatic usage of
systems like MongoDB involves "collections" of fairly homogeneous
documents. If there is a lot of variability in their structure within
a collection, the collection more or less becomes impossible to
usefully query. They aim to be flexible, but still implicitly require
you to insert data with a half-way sensible/consistent structure. This
makes separately indexing the keys less than compelling as a default,
because there is so much duplication of keys in practice.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Heikki Linnakangas 2014-04-09 08:21:26 Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Previous Message Heikki Linnakangas 2014-04-09 06:37:24 Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

Browse pgsql-hackers by date

  From Date Subject
Next Message Nicolas Barbier 2014-04-09 08:01:07 Re: Proposal for Merge Join for Non '=' Operators
Previous Message Craig Ringer 2014-04-09 07:40:36 libpq api wart: no PQconnect variant that can consume output of PQconninfoParse(...)