Re: regexp_replace not respecting greediness

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Ellmann <simon(dot)ellmann(at)tum(dot)de>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: regexp_replace not respecting greediness
Date: 2025-09-19 16:01:53
Message-ID: 2134893.1758297713@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Simon Ellmann <simon(dot)ellmann(at)tum(dot)de> writes:
> With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
> postgres=# SELECT REGEXP_REPLACE('jane(dot)smith(at)example(dot)com', '(dot)*?(at)(dot)*', 'ab');

This is correct according to the rules given at

https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES

specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute." Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall. The discussion in that manual section shows what to do
when you don't like the results.

> Other database systems (e.g., DuckDB, Umbra) match the whole input:

If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-09-20 16:08:11 BUG #19059: PostgreSQL fails to evaluate the cheaper expression first, leading to 45X performance degradation
Previous Message David G. Johnston 2025-09-19 14:09:43 Re: regexp_replace not respecting greediness