Re: jsonpath

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Bartunov <obartunov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: jsonpath
Date: 2019-03-22 12:14:10
Message-ID: CAPpHfdu6meyXZtyWL6vw0P54tJO8VCUL5my5qcfi29C4DDvACQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 22, 2019 at 5:38 AM John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> wrote:
> On Thu, Mar 21, 2019 at 9:59 PM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> > Attaches patches improving jsonpath parser. First one introduces
> > cosmetic changes, while second gets rid of backtracking. I'm also
> > planning to add high-level comment for both grammar and lexer.
>
> The cosmetic changes look good to me. I just noticed a couple things
> about the comments.
>
> 0001:
>
> +/* Check if current scanstring value constitutes a keyword */
>
> 'is a keyword' is better. 'Constitutes' implies parts of a whole.
>
> + * Resize scanstring for appending of given length. Reinitilize if required.
>
> s/Reinitilize/Reinitialize/
>
> The first sentence is not entirely clear to me.

Thank you, fixed.

> 0002:
>
> These two rules are not strictly necessary:
>
> <xnq,xq,xvq,xsq>{unicode}+\\ {
> /* throw back the \\, and treat as unicode */
> yyless(yyleng - 1);
> parseUnicode(yytext, yyleng);
> }
>
> <xnq,xq,xvq,xsq>{hex_char}+\\ {
> /* throw back the \\, and treat as hex */
> yyless(yyleng - 1);
> parseHexChars(yytext, yyleng);
> }
>
> ...and only seem to be there because of how these are written:
>
> <xnq,xq,xvq,xsq>{unicode}+ { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char}+ { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicode}*{unicodefail} { yyerror(NULL, "Unicode
> sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_char}*{hex_fail} { yyerror(NULL, "Hex character
> sequence is invalid"); }
>
> I don't understand the reasoning here -- is it a micro-optimization?
> The following is simpler, allow the rules I mentioned to be removed,
> and make check still passes. I would prefer it unless there is a
> performance penalty, in which case a comment to describe the
> additional complexity would be helpful.
>
> <xnq,xq,xvq,xsq>{unicode} { parseUnicode(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{hex_char} { parseHexChars(yytext, yyleng); }
> <xnq,xq,xvq,xsq>{unicodefail} { yyerror(NULL, "Unicode sequence is invalid"); }
> <xnq,xq,xvq,xsq>{hex_fail} { yyerror(NULL, "Hex character sequence is
> invalid"); }

These rules are needed for unicode. Sequential escaped unicode
characters might be connected by hi surrogate value. See
jsonpath_encoding regression test in attached patch.

Regarding hex, I made it so for the sake of uniformity. But I changed
my mind and decided that simpler flex rules are more important. So,
now they are considered one-by-one.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-cosmetic-changes-jsonpath-parser-2.patch application/octet-stream 19.9 KB
0002-get-rid-of-backtracking-2.patch application/octet-stream 34.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2019-03-22 12:14:33 Re: speeding up planning with partitions
Previous Message Heikki Linnakangas 2019-03-22 12:03:59 Re: GiST VACUUM