From: | "Foster, Russell" <Russell(dot)Foster(at)crl(dot)com> |
---|---|
To: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: 9.5.3: substring: regex greedy operator not picking up chars as expected |
Date: | 2016-08-15 12:55:16 |
Message-ID: | BLUPR0401MB1698BBBBDB72278AA6404D629D120@BLUPR0401MB1698.namprd04.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi David,
Must have missed that in the manual, but makes sense now. Somewhat strange behavior that a non-greedy quantifier basically ruins the rest of the expression for the greedy ones, but at least it’s working as designed. Thanks!
Russell
From: David G. Johnston [mailto:david(dot)g(dot)johnston(at)gmail(dot)com]
Sent: 15 August 2016 8:45 AM
To: Foster, Russell <Russell(dot)Foster(at)crl(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] 9.5.3: substring: regex greedy operator not picking up chars as expected
Working as documented.
https://www.postgresql.org/docs/9.5/static/functions-matching.html#POSIX-MATCHING-RULES<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.postgresql.org%2fdocs%2f9.5%2fstatic%2ffunctions-matching.html%23POSIX-MATCHING-RULES&data=01%7c01%7cRussell.Foster%40crl.com%7cc1e71359ea8c4aa0a40008d3c509f7cf%7c374f8930e1504031bb35483215fe5900%7c0&sdata=n4FmWZi0%2f%2bdgZ5KrY3Bfk1O0npbVGK%2bRCHWnNMMmXVo%3d>
Specifically, this implementation considers greediness at a level higher than just the atom/expression - and in a mixed "branch" if there is a non-greedy quantifier in a branch the entire branch is non-greedy and can in many situations cause greedy atoms to behave non-greedily.
In might help to consider that there aren't really any explicit "greedy" operators like other engines have (i.e., ??, ?, ?+) but rather non-greedy (lazy) and default. The default inherits the non-greedy trait from its parent if applicable otherwise is behaves greedily.
On Mon, Aug 15, 2016 at 7:53 AM, Foster, Russell <Russell(dot)Foster(at)crl(dot)com<mailto:Russell(dot)Foster(at)crl(dot)com>> wrote:
Hello,
For the following query:
select substring('>772' from '.*?[0-9]+')
The pattern itself is non-greedy due to their only being a single branch and it having a non-greedy quantifier within it.
.*? matches ">" and [0-9]+ only needs a single character to generate a non-greedy match conforming match
I would expect the output to be ‘>772’, but it is ‘>7’. You can also see the expected result on https://regex101.com/<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fregex101.com%2f&data=01%7c01%7cRussell.Foster%40crl.com%7cc1e71359ea8c4aa0a40008d3c509f7cf%7c374f8930e1504031bb35483215fe5900%7c0&sdata=ye55TdPxGOB6NUoDn85l%2fEg8o9MgYPkbOv%2bg4mGaXw4%3d>, although I am aware not all regex processors work the same.
The following queries:
select substring('>772' from '^.*?[0-9]+$')
This is treated exactly the same as the above but because of the ^$ the shortest possible output string is the entire string
and:
select substring('>772' from '[0-9]+')
both return ‘>772’, which is expected. Could the less greedy operator on the left (.*?) be affecting the more greedy right one (+)?
Typo here? I'm not fluent with substring(regex).
Anyway, the entire RE (single branch) is now greedy so the greedy [0-9]+ atom matches as many numbers as possible.
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Ilya.Kompanets | 2016-08-15 13:05:24 | Проблема pg_dump.exe |
Previous Message | David G. Johnston | 2016-08-15 12:44:57 | Re: 9.5.3: substring: regex greedy operator not picking up chars as expected |