Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Isaac Morland" <isaac(dot)morland(at)gmail(dot)com>
Cc: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-02 14:21:19
Message-ID: 4f18cca7-5e7f-4f0b-92f1-ca3e2ad9f693@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021, at 15:05, Isaac Morland wrote:
> Suppose the match results are:
>
> [4,8)
> [10,10)
> [13,16)
> [20,20)
> [24,24)
>
> Then this gets turned into:
>
> [4,8)
> empty
> [13,16)
> empty
> empty
>
> So you know that there are non-empty matches from 4-8 and 13-16, plus an empty match between them and two empty matches at the end. Given that all empty strings are identical, I think it's only in pretty rare circumstances where you need to know exactly where the empty matches are; it would have to be a matter of identifying empty matches immediately before or after a specific pattern; in which case I suspect it would usually be just as easy to match the pattern itself directly.
>
> Does this help?

Thanks, I see what you mean now.

I agree it's probably a corner-case,
but I think I would still prefer a complete solution by just returning setof two integer[] values,
instead of the cuter-but-only-partial solution of using the existing int4range[].

Even better would be if we could fix the range type so it could actually be used in this and other similar situations.

If so, then we could do:

SELECT r FROM regexp_positions('caaabaaabeee','(?<=b)a+','g') AS r;
r
-----------
{"[6,9)"}
(1 row)

SELECT r FROM regexp_positions('caaabaaabeee','(?<=b)','g') AS r;
r
---------
{empty}
{empty}
(2 rows)

SELECT lower(r[1]), upper(r[1]) FROM regexp_positions('caaabaaabeee','(?<=b)','g') AS r;
lower | upper
-------+-------
5 | 5
9 | 9
(2 rows)

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-03-02 14:24:15 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Pavel Borisov 2021-03-02 14:08:43 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.