Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>, "Gilles Darold" <gilles(at)darold(dot)net>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-04 16:53:42
Message-ID: fb1649b3-4c29-4858-9176-fbe7921da31b@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
> > Having abandoned the cute idea that didn't work,
> > here comes a new patch with a regexp_positions() instead returning
> > setof record (start_pos integer[], end_pos integer[]).
>
> I wonder if a 2-D integer array wouldn't be a better idea,
> ie {{startpos1,length1},{startpos2,length2},...}. My experience
> with working with parallel arrays in SQL has been unpleasant.

I considered it, but I prefer two separate simple arrays for two reasons:

a) more pedagogic, it's at least then obvious what values are start and end positions,
then you only have to understand what the values means.

b) 2-D doesn't arrays don't work well with unnest().
If you would unnest() the 2-D array you couldn't separate the start positions from the end positions,
whereas with two separate, you could do:

SELECT unnest(start_pos) AS start_pos, unnest(end_pos) AS end_pos FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g');
start_pos | end_pos
-----------+---------
3 | 6
6 | 11
11 | 16
16 | 20
(4 rows)

Can give some details on your unpleasant experiences of parallel arrays?

>
> Also, did you see
>
> https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net
>
>
> Seems like there may be some overlap in these proposals.

Yes, I saw it, it was sent shortly after my proposal, so I couldn't take it into account.
Seems useful, except regexp_instr() seems redundant, I would rather have regexp_positions(),
but maybe regexp_instr() should also be added for compatibility reasons.

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2021-03-04 16:55:20 Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Previous Message Jacob Champion 2021-03-04 16:51:55 Re: Confusing behavior of psql's \e