Re: Define jsonpath functions as stable

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Define jsonpath functions as stable
Date: 2019-09-18 21:12:20
Message-ID: 10777.1568841140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Jonathan S. Katz" <jkatz(at)postgresql(dot)org> writes:
> On 9/17/19 6:40 PM, Tom Lane wrote:
>> After a re-read of the XQuery spec, it seems to me that the character
>> entry form that they have and we don't is actually "&#NNNN;" like
>> HTML, rather than just "#NN". Can anyone double-check that?

> Clicking through the XQuery spec eventual got me to here[1] (which warns
> me that its out of date, but that is what its "current" specs linked me
> to), which describes being able to use "&#[0-9]+;" and "&#[0-9a-fA-F]+;"
> to specify characters (which I recognize as a character escape from
> HTML, XML et al.).

After further reading, it seems like what that text is talking about
is not actually a regex feature, but an outgrowth of the fact that
the regex pattern is being expressed as a string literal in a language
for which XML character entities are a native aspect of the string
literal syntax. So it looks to me like the entities get folded to
raw characters in a string-literal parser before the regex engine
ever sees them.

As such, I think this doesn't apply to SQL/JSON. The SQL/JSON spec
seems to defer to Javascript/ECMAscript for syntax details, and
in either of those languages you have backslash escape sequences
for writing weird characters, *not* XML entities. You certainly
wouldn't have use of such entities in a native implementation of
LIKE_REGEX in SQL.

So now I'm thinking we can just remove the handwaving about entities.
On the other hand, this points up a large gap in our docs about
SQL/JSON, which is that nowhere does it even address the question of
what the string literal syntax is within a path expression. Much
less point out that that syntax is nothing like native SQL strings.
Good luck finding out from the docs that you'd better double any
backslashes you'd like to have in your regex --- but a moment's
testing proves that that is the case in our code as it stands.
Have we misread the spec badly enough to get this wrong?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-09-18 21:28:16 Re: Fix parsing of identifiers in jsonpath
Previous Message Alvaro Herrera 2019-09-18 20:58:53 Re: log bind parameter values on error