regexp_positions()

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: regexp_positions()
Date: 2021-02-27 19:51:27
Message-ID: 0aabac3c-9049-4c55-a82d-a70c5ba43d4d@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Finding all matches in a string is convenient using regexp_matches() with the 'g' flag.

But if instead wanting to know the start and end positions of the occurrences,
one would have to first call regexp_matches(...,'g') to get all matches,
and then iterate through the results and search using strpos() and length()
repeatedly to find all start and end positions.

Assuming regexp_matches() internally already have knowledge of the occurrences,
maybe we could add a regexp_ranges() function that returns a two-dimensional array,
with all the [[start,end], ...] positions?

Some other databases have a singular regexp_position() function,
that just returns the start positions for the first match.
but I don't think such function adds much value,
but if adding the pluralis one then maybe the singularis should be added as well,
for completeness, since we have array_position() and array_positions().

I just wanted to share this idea now since there is currently a lot of other awesome work on the regex engine,
and hear what others who are currently thinking a lot about regexes think of the idea.

/Joel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-02-27 20:35:49 Re: Extending range type operators to cope with elements
Previous Message Justin Pryzby 2021-02-27 19:37:47 Re: Allow matching whole DN from a client certificate