Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Date: 2014-04-09 06:37:24
Message-ID: 5344EAA4.1050605@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On 04/09/2014 01:18 AM, Andrew Dunstan wrote:
>
> On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
>> On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, let me see if I understand the situation correctly:
>>>
>>> * jsonb_ops supports more operators
>>>
>>> * jsonb_hash_ops produces smaller, better-performing indexes
>>>
>>> * jsonb_ops falls over on inputs with wide field values, but
>>> jsonb_hash_ops does not
>> There might be some compelling cases for indexing existence rather
>> than containment, since the recheck flag isn't set there, but in
>> general this summary seems sound. I would say that broadly, existence
>> is a less useful operator than containment, and so jsonb_hash_ops is
>> broadly more compelling. I didn't propose changing the default due to
>> concerns about the POLA, but I'm happy to be told that those concerns
>> were out of proportion to the practical benefits of a different
>> default.
>
> I tend to agree with Tom that POLA will be more violated by the default
> ops class not being able to index some values.

Yeah.

<rant>

Both of the operator classes are actually much less flexible than I'd
like. Firstly, they index everything. In many cases, that's not what you
want, so you end up with much larger indexes than necessary. Secondly,
jsonb_ops indexes all values separately from the keys. That makes the
index pretty much useless for a query on, say, WHERE json @>
'{"needs_processing":true}', if all the rows also contain a key-value
pair "active":true. Thirdly, inequality operators are not supported; you
can't search for rows with (the json-syntax equivalent of) "price <
12.3". Fourthly, sometimes you would want to include the "path" to an
entry in the key, sometimes not.

If I understood correctly the way jsonb_hash_ops works, the limitation
compared to jsonb_ops is that it cannot be used for foo ? 'bar' type
queries. And the reason for that limitation is that it hashes the whole
path to the key; the naked values are not indexes separately. But why
not? jsonb_ops does - why is that decision related to whether you hash
or not? Or it could index both. Sure, it would be wasteful when you
don't need to support foo ? 'bar', but the point is that it should be up
to the DBA to decide, based on his needs.

As the code stands, you don't have a choice on any of those things. The
decisions have been made by us, PostgreSQL developers. The only choice
you have is between jsonb_ops and jsonb_hash_ops, with a strange
combination of tradeoffs in both. Sure, they're still useful, if not
optimal, for a wide-range of applications. For more complicated cases,
you will have to resort to expression indexes. It bugs me greatly that
the underlying indexam could do all those things, we're just not
exposing the capability.

ISTM we need a way to parameterize opclasses, so that when you create
the index, you specify the above things.

</rant>

The ship has cleatly sailed to add parameterized opclasses to 9.4, but
let's keep it in mind when we decide on the defaults.

In the absence of parameterizable opclasses, it would be much more
flexible to have opclasses that index, keys, values, key-value pairs and
paths separately, instead of the current json_ops and json_hash_ops
opclasses which index all of those in the same index. That way, if you
only e.g. ever query on the existence of a key, you'd only need to index
the keys.

I don't understand how we ended up with the current dichotomy of
json_ops and json_hash_ops...

- Heikki

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Peter Geoghegan 2014-04-09 07:40:45 Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)
Previous Message Robert Haas 2014-04-09 04:23:43 Re: Call for GIST/GIN/SP-GIST opclass documentation

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2014-04-09 06:42:36 Pointer to structure in ECPG
Previous Message Sergey Muraviov 2014-04-09 06:32:07 Re: Problem with displaying "wide" tables in psql