Re: Pathological regexp match

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Michael Glaesemann <michael(dot)glaesemann(at)myyearbook(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pathological regexp match
Date: 2010-01-29 04:21:42
Message-ID: 20100129042142.GF1793@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Glaesemann wrote:

> However, as you point out, Postgres doesn't appear to take this into
> account:
>
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*(\2))$r$, $s$X$s$);
> regexp_replace
> ----------------
> oooXooo
> (1 row)
>
> postgres=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)
> [^Q]*A.*?(\2))$r$, $s$X$s$);
> regexp_replace
> ----------------
> oooXooo
> (1 row)

I think the reason for this is that the first * is greedy and thus the
entire expression is considered greedy. The fact that you've made the
second * non-greedy does not ungreedify the RE ... Note the docs say:

The above rules associate greediness attributes not only with
individual quantified atoms, but with branches and entire REs
that contain quantified atoms. What that means is that the
matching is done in such a way that the branch, or whole RE,
matches the longest or shortest possible substring as a whole.

It's late here so I'm not sure if this is what you're looking for:

alvherre=# select regexp_replace('oooZQoooAoooQooQooQooo', $r$(Z(Q)[^Q]*?A.*(\2))$r$, $s$X$s$);
regexp_replace
----------------
oooXooQooQooo
(1 fila)

(Obviously the non-greediness has moved somewhere else) :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Glaesemann 2010-01-29 04:36:58 Re: Pathological regexp match
Previous Message Andrew Dunstan 2010-01-29 04:14:41 out-of-scope cursor errors