BUG #6381: Incorrect greediness behavior in certain regular expressions

From: code(at)phaedrusdeinus(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6381: Incorrect greediness behavior in certain regular expressions
Date: 2012-01-06 00:32:17
Message-ID: E1RixjF-0006gP-Dt@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6381
Logged by: john melesky
Email address: code(at)phaedrusdeinus(dot)org
PostgreSQL version: 9.1.1
Operating system: x86_64-pc-linux-gnu
Description:

This simple regexp returns correctly (that is, (.*?) matches
'blahblah.com'):

=# select regexp_matches('http://blahblah.com/asdf',
'http://(.*?)(/|%2f|$)');
regexp_matches
------------------
{blahblah.com,/}

This, more complex/complete version, matches greedily, which is incorrect:

=# select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f|$)');
regexp_matches
--------------------------------
{"",:,//,blahblah.com/asdf,""}

(That is, (.*?) matches 'blahblah.com/asdf')

The problem appears to be the inclusion of '$' in the final paren group. So,
this works:

select regexp_matches('http://blahblah.com/asdf',
'http(s?)(:|%3a)(//|%2f%2f)(.*?)(/|%2f)');
regexp_matches
--------------------------
{"",:,//,blahblah.com,/}

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-01-06 07:21:19 Re: BUG #6381: Incorrect greediness behavior in certain regular expressions
Previous Message David Fetter 2012-01-05 16:53:40 Re: Proble Postgre SQL version 7.4.1