Re: regexp_positions()

From: David Fetter <david(at)fetter(dot)org>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: regexp_positions()
Date: 2021-02-28 02:13:48
Message-ID: 20210228021347.GD17314@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 27, 2021 at 08:51:27PM +0100, Joel Jacobson wrote:
> Hi,
>
> Finding all matches in a string is convenient using regexp_matches() with the 'g' flag.
>
> But if instead wanting to know the start and end positions of the occurrences,
> one would have to first call regexp_matches(...,'g') to get all matches,
> and then iterate through the results and search using strpos() and length()
> repeatedly to find all start and end positions.
>
> Assuming regexp_matches() internally already have knowledge of the occurrences,
> maybe we could add a regexp_ranges() function that returns a two-dimensional array,
> with all the [[start,end], ...] positions?

Maybe an int4multirange, which would fit unless I'm misunderstanding
g's meaning with respect to non-overlapping patterns, but that might
be a little too cute and not easy ever to extend.

Come to that, would a row structure that looked like

(match, start, end)

be useful?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-02-28 03:58:05 Re: regexp_positions()
Previous Message Michael Paquier 2021-02-27 23:06:09 Re: [PATCH] pgbench: Remove ecnt, a member variable of CState