Re: Q: Escapes in jsonpath Idents

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Q: Escapes in jsonpath Idents
Date: 2024-03-17 19:12:03
Message-ID: hozm6ofykt3bup7xaequgpf47y5hf2xca55rurazdt2gvhqn7u@d4gnloqbuo53
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

On 2024-03-16 19:39 +0100, David E. Wheeler wrote:
> The jsonpath doc[1] has an excellent description of the format of
> strings, but for unquoted path keys, it simply says:
>
> > Member accessor that returns an object member with the specified
> > key. If the key name matches some named variable starting with $ or
> > does not meet the JavaScript rules for an identifier, it must be
> > enclosed in double quotes to make it a string literal.
>
> I went looking for the JavaScript rules for an identifier and found
> this in the MDN docs[2]:
>
> > In JavaScript, identifiers can contain Unicode letters, $, _, and
> > digits (0-9), but may not start with a digit. An identifier differs
> > from a string in that a string is data, while an identifier is part
> > of the code. In JavaScript, there is no way to convert identifiers
> > to strings, but sometimes it is possible to parse strings into
> > identifiers.
>
>
> However, the Postgres parsing of jsonpath keys appears to follow the
> same rules as strings, allowing backslash escapes:
>
> david=# select '$.fo\u00f8 == $x'::jsonpath;
> jsonpath -------------------
> ($."foø" == $"x")
>
> This would seem to contradict the documentation. Is this behavior
> required by the SQL standard? Do the docs need updating? Or should the
> code actually follow the JSON identifier behavior?

That quoted MDN page does not give the whole picture. ECMAScript and JS
do allow Unicode escape sequences in identifier names:

https://262.ecma-international.org/#sec-identifier-names
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers

> PS: Those excellent docs on strings mentions support for \v, but the
> grammar in the right nav of https://www.json.org/json-en.html does
> not. Another bonus feature?

You refer to that sentence: "Other special backslash sequences include
those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various
ASCII control characters, and \uNNNN for a Unicode character identified
by its 4-hex-digit code point."

Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't. I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right. The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.

The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262: https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely. The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.

> [1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
> [2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier

On 2024-03-16 21:33 +0100, David E. Wheeler wrote:
> On Mar 16, 2024, at 14:39, David E. Wheeler <david(at)justatheory(dot)com>
> wrote:
>
> > I went looking for the JavaScript rules for an identifier and found
> > this in the MDN docs[2]:
> >
> >> In JavaScript, identifiers can contain Unicode letters, $, _, and
> >> digits (0-9), but may not start with a digit. An identifier differs
> >> from a string in that a string is data, while an identifier is part
> >> of the code. In JavaScript, there is no way to convert identifiers
> >> to strings, but sometimes it is possible to parse strings into
> >> identifiers.
>
> Coda: Dollar signs don’t work at all outside double-quoted string
> identifiers:
>
> david=# select '$.$foo'::jsonpath;
> ERROR: syntax error at or near "$foo" of jsonpath input
> LINE 1: select '$.$foo'::jsonpath;
> ^
>
> david=# select '$.f$oo'::jsonpath;
> ERROR: syntax error at or near "$oo" of jsonpath input
> LINE 1: select '$.f$oo'::jsonpath;
> ^
>
> david=# select '$."$foo"'::jsonpath;
> jsonpath
> ----------
> $."$foo"
>
> This, too, contradicts the MDM definition an identifier (and some
> quick browser tests).

The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.

The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor. Looks like a bug to me because a variable
doesn't even make sense in that place.

What works though, besides double quoting, is escaping the dollar sign:

regress=# select '$.\u0024foo'::jsonpath;
jsonpath
----------
$."$foo"
(1 row)

And we've come full circle :)

--
Erik

Attachment Content-Type Size
v1-0001-Simplify-docs-on-backslash-escapes-in-jsonpath.patch text/plain 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-03-17 19:21:12 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Noah Misch 2024-03-17 18:31:14 Re: Autogenerate some wait events code and documentation