Re: jsonpath

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: jsonpath
Date: 2019-03-05 21:40:25
Message-ID: 220821fb-52c0-c277-0959-322c8eef76b2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached 36th version of the patches.

On 03.03.2019 21:08, Tomas Vondra wrote:

> Hi,
>
> Here are some initial comments from a review of the 0001 part. I plan to
> do more testing on a large data set and additional round of review over
> the next week. FWIW I've passed this through valgrind and the usual
> battery of regression tests, and there were no issues.

Thanks again for your review.

> I haven't looked at 0002 yet, but it seems fairly small (especially
> compared to 0001).
>
> func.sgml
> =========
>
> 1) I see the changes removed <indexterm zone="functions-json"> for some
> reason. Is that intentional?

Fixed.

> 2) <command>WHERE</command>
>
> We generally tag <literal>WHERE</literal>, not <command>.

Fixed.

> 3) Filter expressions are applied from left to right
>
> Perhaps s/applied/evaluated/ in reference to expressions?

Fixed.

> 4) The result of the filter expression may be true, false, or unknown.
>
> It's not entirely clear to me what "unknown" means here. NULL, or
> something else? There's a section in the SQL/JSON standard
> explaining this (page 83), but perhaps we should explain it a bit
> here too?
>
> The standard says "In the SQL/JSON data model, there are no SQL
> nulls, so Unknown is not part of the SQL/JSON data model." so I'm a
> bit confused what "unknown" references to. Maybe some example?
>
> Also, what happens when the result is unknown?

"unknown" refers here to ordinary three-valued logical Unknown, which is
represented in SQL by NULL.

JSON path expressions return sequences of SQL/JSON items, which are defined by
SQL/JSON data model. But JSON path predicates (logical expressions), which are
used in filters, return three-valued logical values: False, True, or Unknown.

Filters accept only items for which the filter predicate returned True. False
and Unknown results are skipped.

Unknown can be checked with IS UNKNOWN predicate:

SELECT jsonb_path_query_array('[1, "1", true, null]',
'$[*] ? ((@ < 2) is unknown)');
jsonb_path_query_array
------------------------
["1", true]
(1 row)

Comparison of JSON nulls to non-nulls returns always False, not Unknown (see
SQL/JSON data model). Comparison of non-comparable items return Unknown.

> 5) There's an example showing how to apply filter at a certain level,
> using the @ variable, but what if we want to apply multiple filters at
> different levels? Would it make sense to add such example?

Examples were added.

Filters can be nested, but by the standard it is impossible to reference item
of the outer filter, because @ always references current item of the innermost
filter. We have additional patch with a simple jsonpath syntax extension for
this case -- @N:
@0, like @, references innermost item
@1 references item one level upper
@2 references item two levels upper

For example, selecting all objects from array
'[{"vals": [1,2,3], "val": 2}, {"vals": [4,5], "val": 6}]'

having in their .vals[] element greater than their .val field:
'$[*] ? (@.vals[*] ? (@ > @1.val))'

It is impossible to do this by the standard in the single jsonpath expression.

Also there is idea to use lambda expressions with ECMAScript 6 syntax in
filters and user method (see below):

'$[*] ? (obj => obj.vals[*] ? (val => val > obj.val))'

I already have a patch implementing lambda expressions, which were necessary
for implementaion of built-in/user-written functions and methods like
.map(), reduce(), max(). I can post it again if it is interesting.

> 6) ... extensions of the SQL/JSON standard
>
> I'm not sure what "extension" is here. Is that an extension defined
> in the SQL standard, or an additional PostgreSQL functionality not
> described in the standard? (I assume the latter, just checking.)

Yes, this is additional functionality.

"Writing the path as an expression is also valid"
I have moved this into SQL/JSON patch, because it is related only path
specification in SQL/JSON functions.

> 7) There are references to "SQL/JSON sequences" without any explanation
> what it means. Maybe I'm missing something obvious, though.

SQL/JSON sequence is a sequence of SQL/JSON items.
The corresponding definition was added.

> 8) Implicit unwrapping in the lax mode is not performed in the following
> cases:
>
> I suggest to reword it like this:
>
> In the lax mode, implicit unwrapping is not performed when:

Fixed. I also decided to merge this paragraph with paragraph describing
auto-unwrapping.

> 9) We're not adding the datetime() method for now, due to the issues
> with timezones. I wonder if we should add a note why it's missing to the
> docs ...

The corresponding note was added.

> 10) I'm a bit puzzled though, because the standard says this in the
> description of type() function on page 77
>
> If I is a datetime, then “date”, “time without time zone”, “time
> with time zone”, “timestamp without time zone”, or “timestamp with
> time zone”, as appropriate.
>
> But I see our type() function does not return anything like that (which
> I presume is independent of timezone stuff). But I see jsonb.h has no
> concept of datetime values, and the standard actually says this in the
> datetime() section
>
> JSON has no datetime types. Datetime values are most likely stored
> in character strings.
>
> Considering this, the type() section makes little sense, no?

According to the SQL/JSON data model, SQL/JSON items can be null, string,
boolean, numeric, and datetime.

Datetime items exists only at execution time, they are serialized into JSON
strings when the resulting SQL/JSON item is converted into jsonb. After removal
of .datetime() method, support of datetime SQL/JSON items was removed from
jsonpath executor too.

Numeric items can be of any numeric type. In PostgreSQL we have the following
numeric datatypes: integers, floats and numeric. But our jsonpath executor
supports now only numeric-typed items, because this is only type we can get
directly from jsonb. Support for other numeric datatypes (float8 is most
necessary, because it is produced by .double() item method) can be added by
extending JsonbValue or by introducing struct JsonItem in executor, having
JsonbValue as a part (see patch #4 in v34).

> jsonb_util.c
> ============
>
> I see we're now handling NaN values in convertJsonbScalar(). Isn't it
> actually a bug that we don't do this already? Or is it not needed for
> some reason?

Numeric JsonbValues created outside of jsonpath executor cannot be NaN, because
this case in explicitly handled in JsonbValue-producing functions. For example,
datum_to_jsonb() which is used in to_jsonb(), jsonb_build_array() and others
converts Inf and NaN into JSON strings. But jsonb_plperl and jsonb_plpython
transforms do not allow NaNs (I think it is needed only for consistency of
"PL => jsonb => PL" roundtrip).

In our jsonpath executor we can produce NaNs and we should not touch them before
conversion to the resulting jsonb. Moreover, in SQL/JSON functions numeric
SQL/JSON item can be directly converted into numeric RETURNING type. So, I
decided to add a check for NaN to the low-level convertJsonbScalar() instead of
checking items before every call to JsonbValueToJsonb() or pushJsonbValue().
But after introduction of struct JsonItem (see patch #4) introduction there will
appropriate place for this check -- JsonItemToJsonbValue().

> ==========
>
> I suppose this should say "jsonpath version number" instead?
>
> elog(ERROR, "unsupported jsonb version number %d", version);

Fixed.

On 05.03.2019 2:27, Tomas Vondra wrote:
> A bunch of additional comments, after looking at the patch a bit today.
> All are mostly minor, and sometime perhaps a matter of preference.
>
>
> 1) There's a mismatch between the comment and actual function name for
> jsonb_path_match_opr and jsonb_path_exists_opr(). The comments say
> "_novars" instead.

Fixed.

> 2) In a couple of switches the "default" case does a return with a
> value, following elog(ERROR). So it's practically unreachable, AFAICS
> it's fine without it, and we don't do this elsewhere. And I don't get
> any compiler warnings if I remove it either.
>
> Examples:
>
> JsonbTypeName
>
> default:
> elog(ERROR, "unrecognized jsonb value type: %d", jbv->type);
> return "unknown";
>
> jspOperationName
>
> default:
> elog(ERROR, "unrecognized jsonpath item type: %d", type);
> return NULL;
>
> compareItems
>
> default:
> elog(ERROR, "unrecognized jsonpath operation: %d", op);
> return jpbUnknown;

It seems to be a standard practice in jsonb code, so we followed it.

> 3) jsonpath_send is using makeStringInfo() for a value that is not
> returned - IMHO it should use regular stack-allocated variable and use
> initStringInfo() instead

Fixed.

> 4) the version number should be defined/used as a constant, not as a
> magic constant somewhere in the code

Fixed.

> 5) Why does jsonPathToCstring do this?
>
> appendBinaryStringInfo(out, "strict ", 7);
>
> Why not to use regular appendStringInfoString()? What am I missing?

appendStringInfoString() is a bit slower than appendBinaryStringInfo()
because to strlen() call inside it.

> 6) comment typo: "Aling StringInfo"

Fixed.

> 7) alignStringInfoInt() should explain why we need this and why INTALIGN
> is the right alignment.

Comment was added.

> 8) I'm a bit puzzled by what flattenJsonPathParseItem does with 'next'
>
> I don't quite understand what it's doing with 'next' value?
>
> /*
> * Actual value will be recorded later, after next and children
> * processing.
> */
> appendBinaryStringInfo(buf,
> (char *) &next, /* fake value */
> sizeof(next));
>
> Perhaps a comment explaining it (why we need a fake value at all?) would
> be a good idea here.

No fake value is needed here, zero placeholder is enough. I have refactored
this using new function reserveSpaceForItemPointer().

> 9) I see printJsonPathItem is only calling check_stack_depth while
> flattenJsonPathParseItem also calls CHECK_INTERRUPTS. Why the
> difference, considering they seem about equally expensive?

CHECK_INTERRUPT() was added to printJsonPathItem() too.

> 10) executeNumericItemMethod is missing a comment (unlike the other
> executeXXX functions)

Comment was added.

> 11) Wording of some of the error messages in the execute methods seems a
> bit odd. For example executeNumericItemMethod may complain that it
>
> ... is applied to not a numeric value
>
> but perhaps a more natural wording would be
>
> ... is applied to a non-numeric value
>
> And similarly for the other execute methods. But I'm not a native
> speaker, so perhaps the original wording is just fine.

Error messages were changed on the advice of Robert Haas to:
"... can only be applied to a numeric value"

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Partial-implementation-of-SQL-JSON-path-language.patch.gz application/gzip 53.2 KB
0002-Suppression-of-numeric-errors-in-jsonpath.patch.gz application/gzip 5.9 KB
0003-Implementation-of-datetime-in-jsonpath.patch.gz application/gzip 22.5 KB
0004-Jsonpath-support-for-json.patch.gz application/gzip 34.1 KB
0005-Jsonpath-GIN-support.patch.gz application/gzip 9.1 KB
0006-Jsonpath-syntax-extensions.patch.gz application/gzip 12.1 KB

In response to

Responses

  • Re: jsonpath at 2019-03-10 10:51:45 from Alexander Korotkov

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-03-05 22:02:26 Re: Patch to document base64 encoding
Previous Message Filip Rembiałkowski 2019-03-05 21:27:02 fix for BUG #3720: wrong results at using ltree