Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, David Fetter <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-09 08:29:45
Message-ID: CAFj8pRCrwV+m3+YbBqpszKMcZv7m6SyiWKk+hWHkeJ1bPC2HEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 9. 3. 2021 v 9:01 odesílatel Joel Jacobson <joel(at)compiler(dot)org> napsal:

> On Tue, Mar 9, 2021, at 08:26, Pavel Stehule wrote:
>
> there are two ideas:
>
> 1. the behaviour can be same like SLICE clause of FOREACH statement
>
>
> Hm, I'm sorry I don't understand, is there an existing SLICE clause?
> I get syntax error in HEAD:
>
> ERROR: syntax error at or near "$2"
> LINE 5: FOREACH r SLICE $2 IN ARRAY $1 --- now $2 should be consta...
>
> Or do you mean you suggest adding such a clause?
>

https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY

but the SLICE argument should be constant. But this limit is artificial,
just for implementation simplicity. Important is behaviour.

> 2. use unnest_slice as name - the function "unnest" is relatively rich
> today and using other overloading doesn't look too practical.
>
>
> Hm, rich in what way? There is currently only one version for arrays, and
> a different one for tsvector.
>

no, there is possible to unnest more arrays once

> But this is just an idea. I can imagine more forms of slicing or
> unnesting, so it can be practical to use different names than just "unnest".
>
> Personally I don't like too much using 2D arrays for this purpose. The
> queries over this functionality will be harder to read (it is like fortran
> 77). I understand so now, there is no other possibility, because pg cannot
> build array type from function signature. So it is harder to build an array
> of record types.
>
> We can make an easy tuple store of records - like FUNCTION fx(OUT a int,
> OUT b int) RETURNS SETOF RECORD. But now, thanks to Tom and Amit's work,
> the simple expression evaluation is significantly faster than SQL
> evaluation. So using any SRF function has performance impact. What I miss
> is the possibility to write functions like FUNCTION fx(OUT a int, OUT b
> int) RETURNS ARRAY. With this possibility is easy to write functions that
> you need, and is not necessary to use 2d arrays. If the result of regexp
> functions will be arrays of records, then a new unnest function is not
> necessary. So this is not a good direction. Instead of fixing core issues,
> we design workarounds. There can be more wide usage of arrays of composites.
>
>
> Hm, I struggle to understand what your point is.
> 2D arrays already exist, and when having to deal with them, I think
> unnest(anyarray,int) would improve the situation.
>

I cannot find any function in Postgres that returns a 2D array now.

For me - using 2D arrays is not a win. It is not a bad solution, but I
cannot say, so I like it, because it is not a good solution. For example,
you cannot enhance this functionality about returning searched substring.
So you need to repeat searching. I have bad experience with using arrays in
this style. Sometimes it is necessary, because external interfaces cannot
work with composites, but the result is unreadable. So this is the reason
for my opinion.

about unnest_2d .. probably it can be used for some cases when users cannot
use composites on the client side. But now, because they can use FOREACH
SLICE is not problem to write any custom function like they exactly need.
And in this case there is very low overhead of plpgsql. But it is true, so
this function can be used for some vector unnesting.

Now, there might be other situations like you describe where something else
> than 2D arrays are preferred.
> But this doesn't change the fact you sometimes have to deal with 2D
> arrays, in which case the proposed unnest(anyarray,int) would improve the
> user-experience a lot, when wanting to unnest just one level (or N levels).
>

> Sounds like you are suggesting some other improvements, in addition to the
> proposed unnest(anyarray,int)? Correct?
>
> A regexp_positions() returning setof 2-D array[] would not be a
> workaround, in my opinion,
> it would be what I actually want, but only if I also get
> unnest(anyarray,int), then I'm perfectly happy.
>

We are talking about design, not about usage. try to write some examples of
usage, please?

Regards

Pavel

> /Joel
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-09 08:39:43 Re: Make stream_prepare an optional callback
Previous Message Yugo NAGATA 2021-03-09 08:27:50 Re: Implementing Incremental View Maintenance