Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>, "Gilles Darold" <gilles(at)darold(dot)net>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-09 16:42:53
Message-ID: 2033427.1615308173@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> Tom - can you please give details on your unpleasant experiences with parallel arrays?

The problems I can recall running into were basically down to not having
an easy way to iterate through parallel arrays. There are ways to do
that in SQL, certainly, but they all constrain how you write the query,
and usually force ugly stuff like splitting it into sub-selects.

As an example, presuming that regexp_positions is defined along the
lines of

regexp_positions(str text, pat text, out starts int[], out lengths int[])
returns setof record

then to actually get the identified substrings you'd have to do something
like

select
substring([input string] from starts[i] for lengths[i])
from
regexp_positions([input string], [pattern]) r,
lateral
generate_series(1, array_length(starts, 1)) i;

I think the last time I confronted this, we didn't have multi-array
UNNEST. Now that we do, we can get rid of the generate_series(),
but it's still not beautiful:

select
substring([input string] from s for l)
from
regexp_positions([input string], [pattern]) r,
lateral
unnest(starts, lengths) u(s,l);

Having said that, the other alternative with a 2-D array:

regexp_positions(str text, pat text) returns setof int[]

seems to still need UNNEST, though now it's not the magic multi-array
UNNEST but this slicing version:

select
substring([input string] from u[1] for u[2])
from
regexp_positions([input string], [pattern]) r,
lateral
unnest_slice(r, 1) u;

Anyway, I'd counsel trying to write out SQL implementations
of regexp_matches() and other useful things based on any
particular regexp_positions() API you might be thinking about.
Can we do anything useful without a LATERAL UNNEST thingie?
Are some of them more legible than others?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2021-03-09 17:03:31 Re: [patch] [doc] Minor variable related cleanup and rewording of plpgsql docs
Previous Message Chapman Flack 2021-03-09 16:41:22 Re: [PATCH] pg_permissions