Re: Is it possible to index "deep" into a JSONB column?

From: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
To: Imre Samu <pella(dot)samu(at)gmail(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Is it possible to index "deep" into a JSONB column?
Date: 2022-05-31 10:31:31
Message-ID: CAHAc2jewEk8Zm572o_0Qm9-uETfZNLKJOxCooTiRwowgoQXMoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I forgot to say...

I see the documentation on jsonpath indexing says:

"GIN index extracts statements of following form out of jsonpath:
accessors_chain = const. Accessors chain may consist of .key, [*], and
[index] accessors. jsonb_ops additionally supports .* and .**
accessors"

But I'm unable to make "CREATE INDEX...USING gin ((snapshot ->
'$.employee.* ? (@.pay_graph <> 0 || @.last_run_of_employment == true
|| @.state.employment[last][2] == 0)'))" trigger that.

On Tue, 31 May 2022 at 10:16, Shaheed Haque <shaheedhaque(at)gmail(dot)com> wrote:
>
> OK, I was able to translate your excellent note into this:
>
> CREATE INDEX foo ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.works_id'));
>
> and query using "@>" to see it in use:
>
> SELECT ... AND (snapshot -> '$.employee.*.works_id' @> '1091')...
>
> EXPLAIN ANALYSE...
> -> Bitmap Index Scan on foo (cost=0.00..8.22 rows=29
> width=0) (actual time=0.005..0.005 rows=0 loops=1)
> Index Cond: ((snapshot -> '$.employee.*.works_id'::text)
> @> '1091'::jsonb)
>
> Unfortunately, my real query involved multiple AND clauses which (as
> per another recent thread) seems to end up fetching the large
> (probably TOASTed) JSONB once for each AND clause at a cost of 150ms
> each. So, I got rid of the multiple ANDs by staying inside the
> jsonpath like this:
>
> SELECT ... WHERE (... AND
> (snapshot @? '$.employee."2209" ? (
> @.pay_graph <> 0 || @.last_run_of_employment == true ||
> @.state.employment[last][2] == 0)'
> )
> );
>
> But I have not been able to find an index formulation the new jsonpath
> can use. I tried adding
>
> CREATE INDEX ... USING gin ((snapshot -> '$.employee.*'));
> CREATE INDEX ... USING gin ((snapshot -> '$.employee'));
> CREATE INDEX bar ON paiyroll_payrun USING gin ((snapshot ->
> '$.employee.*.pay_graph'));
>
> Any thoughts on indexing for this case? If it makes any difference,
> I'm presently on PG12 and PG13 but looking to move to PG14.
>
> Thanks, Shaheed
>
> On Mon, 30 May 2022 at 19:59, Shaheed Haque <shaheedhaque(at)gmail(dot)com> wrote:
> >
> > Imre,
> >
> > I'm gradually working my way into the combination of SQL, JSONB and
> > jsonpath that this involves even without the indexing, but this looks
> > very helpful/promising, especially on the indexing. Thanks a lot for
> > the input,
> >
> > Shaheed
> >
> > On Mon, 30 May 2022 at 11:50, Imre Samu <pella(dot)samu(at)gmail(dot)com> wrote:
> > >
> > > > In principle, I believe this allows index-assisted access to keys and
> > > > values nested in arrays and inner objects but in practice, it seems
> > > > the planner "often" decides to ignore the index in favour of a table scan.
> > >
> > > part II. index usage ; see the "Bitmap Index Scan on jpqarr_idx"
> > >
> > > SET enable_seqscan = OFF;
> > > select * from jsonb_table
> > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > > +----+---------------------------------------------------------------------------------------------------------------+
> > > | id | jsonb_col |
> > > +----+---------------------------------------------------------------------------------------------------------------+
> > > | 2 | {"stuff": {}, "employee": {"7011": {"date_of_birth": "1970-01-01"}, "7122": {"date_of_birth": "1971-02-02"}}} |
> > > +----+---------------------------------------------------------------------------------------------------------------+
> > > (1 row)
> > >
> > >
> > > EXPLAIN ANALYZE select * from jsonb_table
> > > where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('1971-02-02'::TEXT);
> > > ;
> > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > | QUERY PLAN |
> > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > | Bitmap Heap Scan on jsonb_table (cost=3.00..4.52 rows=1 width=36) (actual time=0.056..0.059 rows=1 loops=1) |
> > > | Recheck Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
> > > | Heap Blocks: exact=1 |
> > > | -> Bitmap Index Scan on jpqarr_idx (cost=0.00..3.00 rows=1 width=0) (actual time=0.026..0.027 rows=1 loops=1) |
> > > | Index Cond: (jsonb_path_query_array(jsonb_col, '$."employee".*."date_of_birth"'::jsonpath, '{}'::jsonb, false) @> to_jsonb('1971-02-02'::text)) |
> > > | Planning Time: 0.255 ms |
> > > | Execution Time: 0.122 ms |
> > > +---------------------------------------------------------------------------------------------------------------------------------------------------------+
> > > (7 rows)
> > >
> > > regards,
> > > Imre
> > >
> > >
> > > Imre Samu <pella(dot)samu(at)gmail(dot)com> ezt írta (időpont: 2022. máj. 30., H, 12:30):
> > >>
> > >> Hi Shaheed,
> > >>
> > >> > WHAT GOES HERE
> > >>
> > >> imho check the: jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' )
> > >>
> > >> may example:
> > >>
> > >> CREATE TABLE jsonb_table (
> > >> id serial primary key,
> > >> jsonb_col JSONB
> > >> );
> > >>
> > >> INSERT INTO jsonb_table(jsonb_col)
> > >> VALUES
> > >> ('{"stuff": {},"employee": {"8011": {"date_of_birth": "1980-01-01"},"8222": {"date_of_birth": "1982-02-02"}}}'),
> > >> ('{"stuff": {},"employee": {"7011": {"date_of_birth": "1970-01-01"},"7122": {"date_of_birth": "1971-02-02"}}}'),
> > >> ('{"stuff": {},"employee": {"a12": {"date_of_birth": "2000-01-01"},"b56": {"date_of_birth": "2000-02-02"}}}')
> > >> ;
> > >>
> > >> select jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) from jsonb_table;
> > >>
> > >> -- create index
> > >> create index jpqarr_idx
> > >> on jsonb_table
> > >> using gin ( jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth' ) );
> > >> -- tests:
> > >> select id from jsonb_table
> > >> where jsonb_path_query_array( jsonb_col, '$.employee.*.date_of_birth') @> to_jsonb('2000-01-01'::TEXT);
> > >> ;
> > >> DROP TABLE
> > >> CREATE TABLE
> > >> INSERT 0 3
> > >> +------------------------------+
> > >> | jsonb_path_query_array |
> > >> +------------------------------+
> > >> | ["1980-01-01", "1982-02-02"] |
> > >> | ["1970-01-01", "1971-02-02"] |
> > >> | ["2000-01-01", "2000-02-02"] |
> > >> +------------------------------+
> > >> (3 rows)
> > >>
> > >> CREATE INDEX
> > >> +----+
> > >> | id |
> > >> +----+
> > >> | 3 |
> > >> +----+
> > >> (1 row)
> > >>
> > >>
> > >> Regards,
> > >> Imre
> > >>
> > >> Shaheed Haque <shaheedhaque(at)gmail(dot)com> ezt írta (időpont: 2022. máj. 29., V, 22:53):
> > >>>
> > >>> Suppose I have a JSONB field called "snapshot". I can create a GIN
> > >>> index on it like this:
> > >>>
> > >>> create index idx1 on mytable using gin (snapshot);
> > >>>
> > >>> In principle, I believe this allows index-assisted access to keys and
> > >>> values nested in arrays and inner objects but in practice, it seems
> > >>> the planner "often" decides to ignore the index in favour of a table
> > >>> scan. (As discussed elsewhere, this is influenced by the number of
> > >>> rows, and possibly other criteria too).
> > >>>
> > >>> Now, I know it is possible to index inner objects, so that is snapshot
> > >>> looks like this:
> > >>>
> > >>> {
> > >>> "stuff": {},
> > >>> "more other stuff": {},
> > >>> "employee": {
> > >>> "1234": {"date_of_birth": "1970-01-01"},
> > >>> "56B789": {"date_of_birth": "1971-02-02"},
> > >>> }
> > >>> }
> > >>>
> > >>> I can say:
> > >>>
> > >>> create index idx2 on mytable using gin ((snapshot -> 'employee'));
> > >>>
> > >>> But what is the syntax to index only on date_of_birth? I assume a
> > >>> btree would work since it is a primitive value, but WHAT GOES HERE in
> > >>> this:
> > >>>
> > >>> create index idx3 on mytable using btree ((snapshot ->'employee' ->
> > >>> WHAT GOES HERE -> 'date_of_birth'));
> > >>>
> > >>> I believe an asterisk "*" would work if 'employee' was an array, but
> > >>> here it is nested object with keys. If it helps, the keys are
> > >>> invariably numbers (in quoted string form, as per JSON).
> > >>>
> > >>> Thanks, Shaheed
> > >>>
> > >>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Danny Shemesh 2022-05-31 10:33:23 The use of partial, expressional indices in pg < 14
Previous Message Shaheed Haque 2022-05-31 09:16:51 Re: Is it possible to index "deep" into a JSONB column?