Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug

From: Robert Schreiber <bobschreiber(at)charter(dot)net>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL 9.3.5 substring(text from pattern for escape) bug
Date: 2019-05-13 09:36:30
Message-ID: 3db2d381-c94d-c6b8-10a8-7cf0ddda6c09@charter.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Just to be clear here. It seems to me that I am right in that the
leading/trailing Q should NOT be returned...

From Section 9.7.2 in the manual:

The |substring| function with three parameters, |substring(/string/
from /pattern/ for /escape-character/)|, provides extraction of a
substring that matches an SQL regular expression pattern. As with
SIMILAR TO, the specified pattern must match the entire data string,
or else the function fails and returns null. To indicate the part of
the pattern that should be returned on success, the pattern must
contain two occurrences of the escape character followed by a double
quote ("). The text matching the portion of the pattern between
these markers is returned.

In my mind I see this:

QMy Q NAmeQ  is interpreted as Q/My Q Name/Q rather than Q/My Q Name Q//.

bob

On 5/12/2019 12:27 AM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> >> I looked up the spec on this point. As far as I can see, we're not
> >> following it, but neither does the spec do what the OP wanted; in
> >> fact the result should have included the _leading_ Q as well as the
> >> trailing one.
>
> Tom> Huh, interesting. So we should be translating the initial
> Tom> substring to a non-greedy pattern. I believe Spencer's engine can
> Tom> handle that by sticking (?:...){1,1}? around it.
>
> Tom> Come to think of it, we probably need to be putting (?:...) around
> Tom> the trailing substring as well. I suspect what we're doing today
> Tom> produces non-spec results if "|" appears in the trailing part.
>
> Digging into it more:
>
> SUBSTRING(x FROM 'expr' FOR 'escape') is from sql92/sql99 and is gone by
> sql2008, replaced by SUBSTRING(x SIMILAR 'expr' ESCAPE 'escape'). sql99
> defines the matching rule using different language, but with the same
> actual effect (requiring shortest matches for the leading and trailing
> strings).
>
> Your suggested fix doesn't seem to work. If the leading/trailing
> substrings do not have | or parens in then it seems to work to wrap them
> in (?:(?:)??...), thanks to the rule that the first quantified atom in a
> subexpression sets the whole subexpression's greediness, but handling |
> or parens correctly seems harder.
>
> Are there any other dbs that implement this feature that we can compare
> against?
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-05-13 13:49:53 Re: BUG #15802: Comparison of a function returning boolean value using relational (<, >, <= or >=) operator
Previous Message PG Bug reporting form 2019-05-13 09:02:49 BUG #15803: Autocomplete issues in new 11.3 and 9.6.13 psql client