Re: Define jsonpath functions as stable

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Define jsonpath functions as stable
Date: 2019-09-16 17:36:29
Message-ID: 1149945c-8a31-ec24-b454-3e410f8a70b6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/16/19 11:20 AM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz(at)postgresql(dot)org> writes:
>> It sounds like the easiest path to completion without potentially adding
>> futures headaches pushing back the release too far would be that, e.g.
>> these examples:
>
>> $.** ? (@ like_regex "O(w|v)" pg flag "i")
>> $.** ? (@ like_regex "O(w|v)" pg)
>
>> If it's using POSIX regexp, I would +1 using "posix" instead of "pg"
>
> I agree that we'd be better off to say "POSIX". However, having just
> looked through the references Chapman provided, it seems to me that
> the regex language Henry Spencer's library provides is awful darn
> close to what XPath is asking for. The main thing I see in the XML/XPath
> specs that we don't have is a bunch of character class escapes that are
> specifically tied to Unicode character properties. We could possibly
> add code to implement those, but I'm not sure how it'd work in non-UTF8
> database encodings.

Maybe taking a page from the pg_saslprep implementation. For some cases
where the string in question would issue a "reject" under normal
SASLprep[1] considerations (really stringprep[2]), PostgreSQL just lets
the string passthrough to the next step, without alteration.

What's implied here is if the string is UTF-8, it goes through SASLprep,
but if not, it is just passed through.

So perhaps the answer is that if we implement XQuery, the escape for
UTF-8 character properties are only honored if the encoding is set to be
UTF-8, and ignored otherwise. We would have to document that said
escapes only work on UTF-8 encodings.

> There may also be subtle differences in the behavior
> of character class escapes that we do have in common, such as "\s" for
> white space; but again I'm not sure that those are any different than
> what you get naturally from encoding or locale variations.
>
> I think we could possibly get away with not having any special marker
> on regexes, but just explaining in the documentation that "features
> so-and-so are not implemented". Writing that text would require closer
> analysis than I've seen in this thread as to exactly what the differences
> are.

+1, and likely would need some example strings too that highlight the
difference in how they are processed.

And again, if we end up updating the behavior in the future, it becomes
a part of our standard deprecation notice at the beginning of the
release notes, though one that could require a lot of explanation.

Jonathan

[1] https://tools.ietf.org/html/rfc4013
[2] https://www.ietf.org/rfc/rfc3454.txt

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-09-16 17:39:33 Re: block-level incremental backup
Previous Message Fabien COELHO 2019-09-16 17:17:27 Re: refactoring - share str2*int64 functions