Re: Fix parsing of identifiers in jsonpath

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Fix parsing of identifiers in jsonpath
Date: 2019-10-02 13:10:18
Message-ID: f6b0228f-71c4-2d21-68c0-dcfa110d18ed@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached v2 patch rebased onto current master.

On 18.09.2019 18:10, Nikita Glukhov wrote:

> Unfortunately, jsonpath lexer, in contrast to jsonpath parser, was written by
> Teodor and me without a proper attention to the stanard. JSON path lexics is
> is borrowed from the external ECMAScript [1], and we did not study it carefully.
>
> There were numerous deviations from the ECMAScript standard in our jsonpath
> implementation that were mostly fixed in the attached patch:
>
> 1. Identifiers (unquoted JSON key names) should start from the one of (see [2]):
> - Unicode symbol having Unicode property "ID_Start" (see [3])
> - Unicode escape sequence '\uXXXX' or '\u{X...}'
> - '$'
> - '_'
>
> And they should continue with the one of:
> - Unicode symbol having Unicode property "ID_Continue" (see [3])
> - Unicode escape sequence
> - '$'
> - ZWNJ
> - ZWJ
>
> 2. '$' is also allowed inside the identifiers, so it is possible to write
> something like '$.a$$b'.
>
> 3. Variable references '$var' are regular identifiers simply starting from the
> '$' sign, and there is no syntax like '$"var"', because quotes are not
> allowed in identifiers.
>
> 4. Even if the Unicode escape sequence '\uXXXX' is used, it cannot produce
> special symbols or whitespace, because the identifiers are displayed without
> quoting (i.e. '$\u{20}' is not possible to display as '$" "' or even more as
> string '"$ "').
>
> 5. All codepoints in '\u{XXXXXX}' greater than 0x10FFFF should be forbidden.
>
> 6. 6 single-character escape sequences (\b \t \r \f \n \v) should only be
> supported inside quoted strings.
>
>
> I don't know if it is possible to check Unicode properties "ID_Start" and
> "ID_Continue" in Postgres, and what ZWNJ/ZWJ is. Now, identifier's starting
> character set is simply determined by the exclusion of all recognized special
> characters.
>
>
> The patch is not so simple, but I believe that it's not too late to fix v12.
>
>
> [1]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-lexical-grammar
> [2]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-names-and-keywords
> [3]https://unicode.org/reports/tr31/

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-parsing-of-identifiers-in-jsonpath-v02.patch text/x-patch 16.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-10-02 13:58:26 Re: [HACKERS] Block level parallel vacuum
Previous Message vignesh C 2019-10-02 09:26:52 Ordering of header file inclusion