Re: jsonpath

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: jsonpath
Date: 2019-03-17 22:57:36
Message-ID: aae2a342-28ff-1801-1e8e-c80203001078@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 17.03.2019 21:29, Jonathan S. Katz wrote:
> On 3/17/19 1:14 PM, Alexander Korotkov wrote:
>> On Sun, Mar 17, 2019 at 8:06 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>>> On 3/17/19 1:02 PM, Alexander Korotkov wrote:
>>>> Thank you for the explanation. Is it jsonb_ops or jsonb_path_ops?
>>> I just used "USING gin(col)" so jsonb_ops.
>> I see. So, jsonb_ops extracts from this query only existence of
>> .length key. And I can bet it exists in all (or almost all) the
>> documents. Thus, optimizer thinks index might be useful, while it's
>> useless. There is not much can be done while we don't have statistics
>> for jsonb (and access to it from GIN extract_query). So, for now we
>> can just refuse from extracting only keys from jsonpath in jsonb_ops.
>> But I think it would be better to just document this issue. In future
>> we should improve that with statistics.
> That seems to make sense, especially given how I've typically stored
> JSON documents in PostgreSQL. It sounds like this particular problem
> would be solved appropriately with JSONB statistics.

GIN jsonb_ops extracts from query

data @? '$.length ? (@ < 150)'

the same GIN entries as from queries

data @? '$.length'
data ? 'length'

If you don't want to extract entries from unsupported expressions, you can try
to use another jsonpath operator @@. Queries will also look like a bit simpler:

data @@ '$.length < 150'
data @@ '$.content like_regex "^Start"'
data @@ '$.content like_regex "risk" flag "i"'

All this queries emit no GIN entries. But note that

data @@ '$ ? (@.content == "foo").length < 150'

emits the same entries as

data @@ '$.content == "foo"'

We already have a POC implementation of jsonb statistics that was written
2 years ago. I rebased it onto the current master yesterday. If it is
interesting, you can find it on my GitHub [1]. But note, that there is
no support for jsonpath operators yet, only boolean EXISTS ?, ?|, ?&, and
CONTAINS @> operators are supported. Also there is no docs, and it works
slowly (a more effective storage method for statistics of individual JSON
paths is needed).

Also there is ability to calculate derived statistics of expressions like

js -> 'x' -> 0 -> 'y'
js #> '{x,0,y}'

using jsonb statistics for columns "js". So the selectivity of expressions

js -> 'x' -> 0 -> 'y' = '123'
js #> '{x,0,y}' >= '123'

also can be estimated (but these expressions can't be used by index on "js").

This topic deserves a separate discussion. I hope, we will start the
corresponding thread for PG13 after we find a more effective way of jsonb
statistics storing.

[1] https://github.com/glukhovn/postgres/tree/jsonb_stats

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-03-17 23:19:00 Re: insensitive collations
Previous Message Tom Lane 2019-03-17 22:41:41 Re: Rare SSL failures on eelpout