Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>, "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-09 08:01:16
Message-ID: 37f841db-a910-4603-bc48-261cbc4f048b@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 9, 2021, at 08:26, Pavel Stehule wrote:
> there are two ideas:
>
> 1. the behaviour can be same like SLICE clause of FOREACH statement

Hm, I'm sorry I don't understand, is there an existing SLICE clause?
I get syntax error in HEAD:

ERROR: syntax error at or near "$2"
LINE 5: FOREACH r SLICE $2 IN ARRAY $1 --- now $2 should be consta...

Or do you mean you suggest adding such a clause?

> 2. use unnest_slice as name - the function "unnest" is relatively rich today and using other overloading doesn't look too practical.

Hm, rich in what way? There is currently only one version for arrays, and a different one for tsvector.

> But this is just an idea. I can imagine more forms of slicing or unnesting, so it can be practical to use different names than just "unnest".
>
> Personally I don't like too much using 2D arrays for this purpose. The queries over this functionality will be harder to read (it is like fortran 77). I understand so now, there is no other possibility, because pg cannot build array type from function signature. So it is harder to build an array of record types.
>
> We can make an easy tuple store of records - like FUNCTION fx(OUT a int, OUT b int) RETURNS SETOF RECORD. But now, thanks to Tom and Amit's work, the simple expression evaluation is significantly faster than SQL evaluation. So using any SRF function has performance impact. What I miss is the possibility to write functions like FUNCTION fx(OUT a int, OUT b int) RETURNS ARRAY. With this possibility is easy to write functions that you need, and is not necessary to use 2d arrays. If the result of regexp functions will be arrays of records, then a new unnest function is not necessary. So this is not a good direction. Instead of fixing core issues, we design workarounds. There can be more wide usage of arrays of composites.

Hm, I struggle to understand what your point is.
2D arrays already exist, and when having to deal with them, I think unnest(anyarray,int) would improve the situation.
Now, there might be other situations like you describe where something else than 2D arrays are preferred.
But this doesn't change the fact you sometimes have to deal with 2D arrays, in which case the proposed unnest(anyarray,int) would improve the user-experience a lot, when wanting to unnest just one level (or N levels).

Sounds like you are suggesting some other improvements, in addition to the proposed unnest(anyarray,int)? Correct?

A regexp_positions() returning setof 2-D array[] would not be a workaround, in my opinion,
it would be what I actually want, but only if I also get unnest(anyarray,int), then I'm perfectly happy.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-09 08:02:40 Re: About to add WAL write/fsync statistics to pg_stat_wal view
Previous Message Kyotaro Horiguchi 2021-03-09 07:53:11 Re: shared-memory based stats collector