Re: jsonpath

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: jsonpath
Date: 2018-12-03 23:25:04
Message-ID: 38018365-5793-2827-e3c1-5a3b259fb6ae@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Attached 21-st version of the patches rebased onto the current master.

Added new patch 0004 with documentation written by Liudmila Mantrova, and
optional patch 0003 that was separated from 0002.

On 24.11.2018 23:03, Tomas Vondra wrote:

> Hi,
>
> I've done another round of reviews on v20, assuming the patch is almost
> ready to commit, but unfortunately I ran into a bunch of issues that
> need to be resolved. None of this is a huge issue, but it's also above
> the threshold of what could be tweaked by a committer IMHO.
Thank your for your review.
> (Which brings the question who plans to commit this. The patch does not
> have a committer in the CF app, but I see both Teodor and Alexander are
> listed as it's authors, so I'd expect it to be one of those. Or I might
> do that, of course.)
> 0001
> ----
>
> 1) to_timestamp() does this:
>
> do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false,
> &tm, &fsec, &fprec, NULL);
>
> Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the
> function did before (well, it called text_to_cstring, but that does
> VARDATA_ANY). The same thing applies to to_date(), BTW.
>
> I also find it a bit inconvenient that we expand the fmt like this in
> all do_to_timestamp() calls, although it's just to_datetime() that needs
> to do it this way. I realize we can't change to_datetime() because it's
> external API, but maybe we should make it construct a varlena and pass
> it to do_to_timestamp().

Of course, there must be VARDATA_ANY() instead of of VARDATA(). But I decided
to construct varlena and pass it to do_to_timestamp() and to to_datetime().

> 2) We define both DCH_FF# and DCH_ff#, but we never ever use the
> lower-case version. Heck, it's not mentioned even in DCH_keywords, which
> does this:
>
> ...
> {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */
> ...
> {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */
> ...
>
> Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and
> "day".

Yes, "ff#" are mapped to DCH_FF# like "mi" is mapped DCH_MI.

"Day", "day" are not mapped to DCH_DAY because they determine letter case in the
output, but "ff1" and "FF#" output contains only digits.

> Also, the comment for "ff1" is wrong, it should be "f" I guess.
Fixed.
> And of course, there are regression tests for FF# variants, but not the
> lower-case ones.

Tests were added

> 3) Of course, these new formatting patterns should be added to SGML
> docs, for example to "Template Patterns for Date/Time Formatting" in
> func.sgml (and maybe to other places, I haven't looked for them very
> thoroughly).

Fixed.

> 4) The last block in DCH_from_char() does this
>
> while (*s == ' ')
> s++;
>
> but I suppose it should use isspace() just like the code immediately
> before it.

Fixed.

> 5) I might be missing something, but why is there the "D" at the end of
> the return flags from DCH_from_char?
>
> /* Return flags for DCH_from_char() */
> #define DCH_DATED 0x01
> #define DCH_TIMED 0x02
> #define DCH_ZONED 0x04

These terms "dated", "timed", and "zoned" are taken from the standard:

9.39.10) If <JSON datetime template> JDT is specified, ... :
a) If JDT contains <datetime template year>, ... then JDT is dated.
b) If JDT contains <datetime template 12-hour>, ... then JDT is timed.
c) If JDT contains <datetime template time zone hour>, ... then JDT is zoned.

> 0002
> ----
>
> 1) There are some unnecessary changes to to_datetime signature (date_txt
> renamed to vs. datetxt), which is mostly minor but unnecessary churn.
>
> 2) There are some extra changes to to_datetime (extra parameters, etc.).
> I wonder why those are not included in 0001, as part of the supporting
> datetime infrastructure.
Extra changes to to_datetime() were moved into patch 0001.

> 3) I'm not sure whether the _safe functions are a good or bad idea, but
> at the very least the comments should be updated to explain what it does
> (as the API has changed, obviously).


This functions were introduced for elimination of PG_TRY/PG_CATCH in jsonpath
code. But this error handling approach has a lot of issues (see below), so I
decided separate again them into optional patch 0004.

> 4) the json.c changes are under-documented, e.g. there are no comments
> for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and
> whether it needs to be specified all the time, half of the functions at
> the end don't have comments (some of them are really simple, but then
> other simple functions do have comments).
>
> I don't know what the right balance here is (I'm certainly not asking
> for bogus comments just to have comments) and I agree that the existing
> code is not exactly abundant in comments. But perhaps having at least
> some would be nice.
Some new comments were added to json.c.
> The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There
> are pretty much no comments whatsoever (at least at the function level,
> explaining what the functions do). It would be good to have a file-level
> comment too, explaining how jsonpath works, probably.

I hope to add detailed comments for jsonpath in the next version of the
patches.

> 5) I see uniqueifyJsonbObject now does this:
>
> if (!object->val.object.uniquified)
> return;
>
> That seems somewhat strange, considering the function says it'll
> uniqueify objects, but then exits when noticing the passed object is not
> uniquified?

'uniquified' field was rename to 'uniquify'.

> 6) Why do we make make_result inline? (numeric.c) If this needs to be
> marked with "inline" then perhaps all the _internal functions should be
> marked like that too? I have my doubts about the need for this.
>
>
> 7) The other places add _safe to functions that don't throw errors
> directly and instead update edata. Why are set_var_from_str, div_var,
> mod_var excluded from this convention?

I added "_safe" postfix only to functions having the both variants: safe (error
info is placed into *edata) and unsafe (errors are always thrown).

One-line make_result() is called from many unsafe places, so I decided to leave
it as it was. It can also be defined as a simple macro.

New "_internal" functions are called from jsonpath_exec.c, so they can't be
"static inline", but "extern inline" seems to be non-portable.

> 8) I wonder if the changes in numeric can have negative impact on
> performance. Has anyone done any performance tests of this part?

I've tried to measure this impact by evaluation of expression with dozens of '+'
or even by adding a loop into numeric_add(), but failed to get any measurable
difference.

> 0003
> ----
>
> 1) jsonb_gin.c should probably add comments briefly explaining what
> JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and
> JsonPathExtractionContext are for.
A lot of comments were added. Also major refactoring of jsonb_gin.c was
done.

> 2) I find it a bit suspicious that there are no asserts in json_gin.c
> (well, there are 3 in the existing code, but nothing in the new code,
> and the patch essentially doubles the amount of code here).
I have managed to find only 4 similar places suitable for asserts.

> No comments for 0004 at this point.
>
> regards

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Preliminary-datetime-infrastructure-v21.patch text/x-patch 35.2 KB
0002-Jsonpath-engine-and-operators-v21.patch text/x-patch 319.9 KB
0003-Remove-PG_TRY-in-jsonpath-arithmetics-v21.patch text/x-patch 28.0 KB
0004-Jsonpath-docs-v21.patch text/x-patch 40.1 KB
0005-Jsonpath-GIN-support-v21.patch text/x-patch 47.0 KB
0006-Jsonpath-syntax-extensions-v21.patch text/x-patch 47.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-12-04 00:07:40 GIN predicate locking slows down valgrind isolationtests tremendously
Previous Message Alvaro Herrera 2018-12-03 22:50:19 don't mark indexes invalid unnecessarily