From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Simon Ellmann <simon(dot)ellmann(at)tum(dot)de> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: regexp_replace not respecting greediness |
Date: | 2025-09-19 16:01:53 |
Message-ID: | 2134893.1758297713@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Simon Ellmann <simon(dot)ellmann(at)tum(dot)de> writes:
> With the following regular expression, the second .* seems to match non-greedily although (if I am correct) it should match greedily:
> postgres=# SELECT REGEXP_REPLACE('jane(dot)smith(at)example(dot)com', '(dot)*?(at)(dot)*', 'ab');
This is correct according to the rules given at
https://www.postgresql.org/docs/current/functions-matching.html#POSIX-MATCHING-RULES
specifically that "A branch — that is, an RE that has no top-level |
operator — has the same greediness as the first quantified atom in it
that has a greediness attribute." Because of that, the RE as a whole
is non-greedy and will match the shortest not longest amount of text
overall. The discussion in that manual section shows what to do
when you don't like the results.
> Other database systems (e.g., DuckDB, Umbra) match the whole input:
If your complaint is "but it's not like Perl!", I suggest using
a plperl function to do your regexp work.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2025-09-20 16:08:11 | BUG #19059: PostgreSQL fails to evaluate the cheaper expression first, leading to 45X performance degradation |
Previous Message | David G. Johnston | 2025-09-19 14:09:43 | Re: regexp_replace not respecting greediness |