JSON Path and GIN Questions

From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: JSON Path and GIN Questions
Date: 2023-09-13 00:16:53
Message-ID: 15DD78A5-B5C4-4332-ACFE-55723259C07F@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings Hackers,

Been a while! I’m working on some experiments with JSONB columns and GIN indexes, and have operated on the assumption that JSON Path operations would take advantage of GIN indexes, with json_path_ops as a nice optimization. But I’ve run into what appear to be some inconsistencies and oddities I’m hoping to figure out with your help.

For the examples in this email, I’m using this simple table:

CREATE TABLE MOVIES (id SERIAL PRIMARY KEY, movie JSONB NOT NULL);
\copy movies(movie) from PROGRAM 'curl -s https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json | jq -c ".[]" | sed "s|\\\\|\\\\\\\\|g"';
create index on movies using gin (movie);
analyze movies;

That gives me a simple table with around 3600 rows. Not a lot of data, but hopefully enough to demonstrate the issues.

Issue 1: @@ vs @?
-----------------

I have been confused as to the difference between @@ vs @?: Why do these return different results?

david=# select id from movies where movie @@ '$ ?(@.title == "New Life Rescue")';
id
----
(0 rows)

david=# select id from movies where movie @? '$ ?(@.title == "New Life Rescue")';
id
----
10
(1 row)

I posted this question on Stack Overflow (https://stackoverflow.com/q/77046554/79202), and from the suggestion I got there, it seems that @@ expects a boolean to be returned by the path query, while @? wraps it in an implicit exists(). Is that right?

If so, I’d like to submit a patch to the docs talking about this, and suggesting the use of jsonb_path_query() to test paths to see if they return a boolean or not.

Issue 2: @? Index Use
---------------------

From Oleg’s (happy belated birthday!) notes (https://github.com/obartunov/sqljsondoc/blob/master/jsonpath.md#jsonpath-operators):

> Operators @? and @@ are interchangeable:
>
> js @? '$.a' <=> js @@ 'exists($.a)’
> js @@ '$.a == 1' <=> js @? '$ ? ($.a == 1)’

For the purposes of the above example, this appears to hold true: if I wrap the path query in exists(), @@ returns a result:

david=# select id from movies where movie @@ 'exists($ ?(@.title == "New Life Rescue"))';
id
----
10
(1 row)

Yay! However, @@ and @? don’t seem to use an index the same way: @@ uses a GIN index while @? does not.

Or, no, fiddling with it again just now, I think I have still been confusing these operators! @@ was using the index with an an explicit exists(), but @? was not…because I was still using an explicit exists.

In other words:

* @@ 'exists($ ?($.year == 1944))' Uses the index
* @? '$ ?(@.year == 1944)' Uses the index
* @? 'exists($ ?($.year == 1944))' Does not use the index

That last one presumably doesn’t work, because there is an implicit exists() around the exists(), making it `exists(exists($ ?($.year == 1944)))`, which returns true for every row (true and false both exists)! 🤦🏻‍♂️.

Anyway, if I have this right, I’d like to flesh out the docs a bit.

Issue 3: Index Use for Comparison
---------------------------------

From the docs (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had assumed any JSON Path query would be able to use the GIN index. However while the use of the == JSON Path operator is able to take advantage of the GIN index, apparently the >= operator cannot:

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 2023)';
QUERY PLAN ---------------------------------------------------------------------------------------------------------
Seq Scan on movies (cost=0.00..3741.41 rows=366 width=4) (actual time=34.815..36.259 rows=192 loops=1)
Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
Rows Removed by Filter: 36081
Planning Time: 1.864 ms
Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand not working, since it stores hashes of paths, which would allow only exact matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is there perhaps some other op class that would allow it to work? Or would I have to create a separate BTREE index on `movie -> 'year'`?

Thanks your your patience with my questions!

Best,

David

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gurjeet Singh 2023-09-13 00:25:44 Re: Document that server will start even if it's unable to open some TCP/IP ports
Previous Message Andres Freund 2023-09-12 23:49:01 Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?