Re: Unable to make use of "deep" JSONB index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Unable to make use of "deep" JSONB index
Date: 2022-06-13 02:23:12
Message-ID: CAMkU=1wWSX_J+2ThKtvZXPqqSdFLHjsVC19DB0zAa2XhtaULGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jun 12, 2022 at 3:31 PM Shaheed Haque <shaheedhaque(at)gmail(dot)com>
wrote:

> (Resend, wrong version was sent before)
>
> Thanks Erik. Is the point that the index has to be on the JSON field
> as a whole (i.e. "snapshot") rather than deep inside it (e.g.
> "snapshot.something.further[down]")?
>
> In my case, the snapshot is several MB in size (perhaps 10MB or even
> 20MB), dominated by the snapshot.employee (cardinality 10k, each sized
> as a dict 1-2kB as text). My expectation/guess is that an index of
> "snapshot" will itself be of a size of similar order. However the
> design as-is works very well except for this one case where to speed
> it up, in principle, the index need contain no more than one boolean
> per employee. So that's what I'd like to achieve, if possible.
>

It sounds like what you really want here is to extract just the list of the
ids which meet one of your three further criteria, and index that list.

You can do that with jsonpath, but you have to apply it with a function,
not one of the boolean-returning operators.

This almost works to do that:

create index on payrun using gin (jsonb_path_query_array(snapshot,
'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true ||
@.state.employment[last][2] == 0).id'));

Then query it like:

select * from payrun where jsonb_path_query_array(snapshot, '$.employee.* ?
(@.pay_graph <> 0 || @.last_run_of_employment == true ||
@.state.employment[last][2] == 0).id') ? '999';

The problem is that ? only tests for top-level text values, while in your
example the value for the 'id' keys are ints, not text. So they are not
findable with the ? operator. If I edit your example data to wrap the ids'
values in double quotes, turning them into json strings rather than json
ints, then this does work for me.

Maybe there is a way to modify the jsonpath so that it converts the ints to
text for you. But if there is such a way, I don't know what it is.

If I were doing this for my own system, I would probably create an
immutable user-defined-function which took a jsonb and returned an int[] of
the filtered id values. Then you would have to query it with @> rather
than ?.

> - I've not done things right, in which case I'd love to know my mistake.
> - It is not supposed to work, in which case it would be good to have
> that stated, and maybe have PG not allow useless indices to be
> created.
> - It is a bug.
>

I guess it is number one mixed with number two. The index you created is
useless for your intended purpose, but is not useless for every conceivable
purpose. It is not realistic to expect PostgreSQL to reject things just
because it is not obvious (to a computer) what you are getting up to.

Cheers,

Jeff

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2022-06-13 03:45:51 Re: Extension pg_trgm, permissions and pg_dump order
Previous Message David G. Johnston 2022-06-13 00:04:40 Re: Unable to make use of "deep" JSONB index