Re: jsonpath

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: jsonpath
Date: 2018-11-06 14:31:46
Message-ID: 9a175c2d-b0a9-475f-078b-febc185282ff@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 29.10.2018 2:20, Tomas Vondra wrote:

> On 10/02/2018 04:33 AM, Michael Paquier wrote:
>> On Sat, Sep 08, 2018 at 02:21:27AM +0300, Nikita Glukhov wrote:
>>> Attached 18th version of the patches rebased onto the current master.
>> Nikita, this version fails to apply, as 0004 has conflicts with some
>> regression tests. Could you rebase? I am moving the patch to CF
>> 2018-11, waiting for your input.
>> --
>> Michael
>>
> As Michael mentioned, the patch does not apply anymore, so it would be
> good to provide a rebased version for CF 2018-11. I've managed to do
> that, as the issues are due to minor bitrot, so that I can do some quick
> review of the current version.

Sorry that I failed to provide rebased version earlier.

Attached 19th version of the patches rebased onto the current master.

> I haven't done much testing, pretty much just compiling, running the
> usual regression tests and reading through the patches. I plan to do
> more testing once the rebased patch is submitted.
>
> Now, a couple of comments based on eye-balling the diffs.

Thank you for your review.

> 1) There are no docs, which makes it pretty much non-committable for
> now. I know there is [1] and it was a good intro for the review, but we
> need something as a part of our sgml docs.

I think that jsonpath part of documentation can be extracted from [2] and
added to the patch set.

> 2) 0001 says this:
>
> *typmod = -1; /* TODO implement FF1, ..., FF9 */
>
> but I have no idea what FF1 or FF9 are. I guess it needs to be
> implemented, or explained better.

FF1-FF9 are standard datetime template fields used for specifying of fractional
seconds. FF3/FF6 are analogues of our MS/US. I decided simply to implement
this feature (see patch 0001, I also can supply it in the separate patch).

But FF7-FF9 are questionable since the maximal supported precision is only 6.
They are optional by the standard:

95) Specifications for Feature F555, “Enhanced seconds precision”:
d) Subclause 9.44, “Datetime templates”:
i) Without Feature F555, “Enhanced seconds precision”,
a <datetime template fraction> shall not be FF7, FF8 or FF9.

So I decided to allow FF7-FF9 only on the output in to_char().

> 3) The makefile rule for scan.o does this:
>
> +# Latest flex causes warnings in this file.
> +ifeq ($(GCC),yes)
> +scan.o: CFLAGS += -Wno-error
> +endif
>
> That seems a bit ugly, and we should probably try to make it work with
> the latest flex, instead of hiding the warnings. I don't think we have
> any such ad-hoc rules in other Makefiles. If we really need it, can't we
> make it part of configure, and/or restrict it depending on flex version?

These lines seem to belong to the earliest versions of our jsonpath
implementation. There is no scan.o file now at all, there is only
jsonpath_scan.o, so I simply removed these lines.

> 4) There probably should be .gitignore rule for jsonpath_gram.h, just
> like for other generated header files.

I see 3 rules in /src/backend/utils/adt/.gitignore:

/jsonpath_gram.h
/jsonpath_gram.c
/jsonpath_scan.c

> 5) jbvType says jbvDatetime is "virtual type" but does not explain what
> it is. IMHO that deserves a comment or something.

"Virtual type" means here that it is used only for in-memory processing and
converted into JSON string when outputted to jsonb. Corresponding comment
was added.

> 6) I see the JsonPath definition says this about header:
>
> /* just version, other bits are reservedfor future use */
>
> but the very next thing it does is defining two pieces stored in the
> header - version AND "lax" mode flag. Which makes the comment invalid
> (also, note the missing space after "reserved").

Fixed.

> FWIW, I'd use JSONPATH_STRICT instead of JSONPATH_LAX. The rest of the
> codebase works with "strict" flags passed around, and it's easy to
> forget to negate the flag somewhere (at least that's my experience).

Jsonpath lax/strict mode flag is used only in executeJsonPath() where it is
saved in "laxMode" field. New "strict" flag passed to datetime functions
is unrelated to jsonpath.

> 7) I see src/include/utils/jsonpath_json.h adds support for plain json
> by undefining various jsonb macros and redirecting them to the json
> variants. I find that rather suspicious - imagine you're investigating
> something in code using those jsonb macros, and wondering why it ends up
> calling the json stuff. I'd be pretty confused ...

I agree, this is rather simple but doubtful solution. That's why json support
was in a separate patch until the 18th version of the patches.

But if we do not want to compile jsonpath.c twice with different definitions,
then we need some kind of run-time wrapping over json strings and jsonb
containers, which seems a bit harder to implement.

To simplify debugging I can also suggest to explicitly preprocess jsonpath.c
into jsonpath_json.c using redefinitions from jsonpath_json.h before its
compilation.

> Also, some of the redefinitions are not really needed for example
> JsonbWrapItemInArray and JsonbWrapItemsInArray are not used anywhere
> (and neither are the json variants).

These definitions will be used in the next patches, so I removed them
for now from this patch.

> 8) I see 0002 defines IsAJsonbScalar like this:
>
> #define IsAJsonbScalar(jsonbval) (((jsonbval)->type >= jbvNull && \
> (jsonbval)->type <= jbvBool) || \
> (jsonbval)->type == jbvDatetime)
>
> while 0004 does this
>
> #define jspIsScalar(type) ((type) >= jpiNull && (type) <= jpiBool)
>
> I know those are for different data types (jsonb vs. jsonpath), but I
> suppose jspIsScalar should include the datetime too.

jpiDatetime does not mean the same thing as jbvDatetime.
jpiDatetime is a representation of ".datetime()" item method.
jbvDatetime is a representation of datetime SQL/JSON items.

Also datetime SQL/JSON items can be represented in JSON path only by strings
with ".datetime()" method applied, they do not have their own literal:
'"2018-01-10".datetime()'

> FWIW JsonPathItemType would deserve better documentation what the
> various items mean (a comment for each line would be enough). I've been
> wondering if "jpiDouble" means scalar double value until I realized
> there's a ".double()" function for paths.

I added per-item comments.

> 9) It's generally a good idea to make the individual pieces committable
> separately, but that means e.g. the regression tests have to pass after
> each patch. At the moment that does not seem to be the case for 0002,
> see the attached file. I'm running with -DRANDOMIZE_ALLOCATED_MEMORY,
> not sure if that's related.

This should definitely be a bug in json support, but I can't reproduce
it simply by defining -DRANDOMIZE_ALLOCATED_MEMORY. Could you provide
a stack trace at least?

[1] https://github.com/obartunov/sqljsondoc/blob/master/README.jsonpath.md
[2] https://github.com/postgrespro/sqljson/tree/sqljson_doc

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Preliminary-datetime-infrastructure-v19.patch text/x-patch 32.1 KB
0002-Jsonpath-engine-and-operators-v19.patch text/x-patch 343.8 KB
0003-Jsonpath-GIN-support-v19.patch text/x-patch 42.8 KB
0004-Jsonpath-syntax-extensions-v19.patch text/x-patch 46.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2018-11-06 14:36:31 Re: Optimizing nested ConvertRowtypeExpr execution
Previous Message Pavel Stehule 2018-11-06 14:23:50 Re: PostgreSQL vs SQL/XML Standards