Re: Define jsonpath functions as stable

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Define jsonpath functions as stable
Date: 2019-09-18 22:47:15
Message-ID: 5D82B3F3.9080808@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/18/19 17:12, Tom Lane wrote:

> After further reading, it seems like what that text is talking about
> is not actually a regex feature, but an outgrowth of the fact that
> the regex pattern is being expressed as a string literal in a language
> for which XML character entities are a native aspect of the string
> literal syntax. So it looks to me like the entities get folded to
> raw characters in a string-literal parser before the regex engine
> ever sees them.

Hmm. That occurred to me too, but I thought the explicit mention of
'character reference' in the section specific to regexes[1] might not
mean that. It certainly could have been clearer.

But you seem to have the practical agreement of both BaseX:

let $foo := codepoints-to-string((38,35,120,54,49,59))
return ($foo, matches('a', $foo))
------
&#x61;
false

and the Saxon-based pljava example:

select occurrences_regex('&#x61;', 'a', w3cNewlines => true);
occurrences_regex
-------------------
0

> As such, I think this doesn't apply to SQL/JSON. The SQL/JSON spec
> seems to defer to Javascript/ECMAscript for syntax details, and
> in either of those languages you have backslash escape sequences
> for writing weird characters, *not* XML entities. You certainly
> wouldn't have use of such entities in a native implementation of
> LIKE_REGEX in SQL.

So yeah, that seems to be correct.

The upshot seems to be a two-parter:

1. Whatever string literal syntax is used in front of the regex engine
had better have some way to represent any character you could want
to match, and
2. There is only one way to literally match a character that is a regex
metacharacter, namely, to precede it with a backslash (that the regex
engine will see; therefore doubled if necessary). Whatever codepoint
escape form might be available in the string literal syntax does not
offer another way to do that, because it happens too early, before
the regex engine can see it.

> So now I'm thinking we can just remove the handwaving about entities.
> On the other hand, this points up a large gap in our docs about
> SQL/JSON, which is that nowhere does it even address the question of
> what the string literal syntax is within a path expression.

That does seem like it ought to be covered.

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-09-18 22:51:09 Re: [PATCH] src/test/modules/dummy_index -- way to test reloptions from inside of access method
Previous Message Alvaro Herrera 2019-09-18 22:18:13 Re: Optimization of some jsonb functions