Re: Fix parsing of identifiers in jsonpath

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Fix parsing of identifiers in jsonpath
Date: 2019-09-18 15:29:54
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/18/19 11:10 AM, Nikita Glukhov wrote:

> 4. Even if the Unicode escape sequence '\uXXXX' is used, it cannot produce
>    special symbols or whitespace, because the identifiers are displayed
> ...
> I don't know if it is possible to check Unicode properties "ID_Start" and
> "ID_Continue" in Postgres, and what ZWNJ/ZWJ is.

ZWNJ and ZWJ are U+200C and U+200D (mentioned in [1]).

Also, it's not just that a Unicode escape sequence can't make a
special symbol or whitespace; it can't make any character that's
not allowed there by the other rules:

"A UnicodeEscapeSequence cannot be used to put a code point into an
IdentifierName that would otherwise be illegal. In other words, if a \
UnicodeEscapeSequence sequence were replaced by the SourceCharacter it
contributes, the result must still be a valid IdentifierName that has
the exact same sequence of SourceCharacter elements as the original
IdentifierName. All interpretations of IdentifierName within this
specification are based upon their actual code points regardless of
whether or not an escape sequence was used to contribute any particular
code point."

A brief glance through src/backend/utils/mb/Unicode shows that the
Makefile does download a bunch of stuff, but maybe not the Unicode
character data that would allow testing ID_Start and ID_Continue?
I'm not sure.


In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-09-18 15:31:18 Re: Commit fest 2019-09
Previous Message Tom Lane 2019-09-18 15:19:56 Re: PGCOLOR? (Re: pgsql: Unified logging system for command-line programs)