Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Joel Jacobson <joel(at)compiler(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>, David Fetter <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-09 09:18:01
Message-ID: CAFj8pRAomDSpQmfmy_8JmRLyCMbDjXkW-oYAAdW=-PfxEWtjWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 9. 3. 2021 v 10:01 odesílatel Joel Jacobson <joel(at)compiler(dot)org> napsal:

> On Tue, Mar 9, 2021, at 09:29, Pavel Stehule wrote:
>
>
>
> https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY
>
> but the SLICE argument should be constant. But this limit is artificial,
> just for implementation simplicity. Important is behaviour.
>
>
> I see now what you mean. Yes, being able to specify the SLICE argument as
> a variable instead of a constant would be a good improvement. Maybe the
> SLICE implementation from PL/pgSQL could be modified and used for both
> cases? (Both in the C-version unnest() and in PL/pgSQL to allow variables
> and not just constants to SLICE)
>

probably

>
>
>
> 2. use unnest_slice as name - the function "unnest" is relatively rich
> today and using other overloading doesn't look too practical.
>
>
> Hm, rich in what way? There is currently only one version for arrays, and
> a different one for tsvector.
>
>
> no, there is possible to unnest more arrays once
>
>
> What do you mean?
> More than one unnest() in the same query, e.g. SELECT unnest(..),
> unnest(..)?
>

you can do unnest(array1, array2, ...)

> I cannot find any function in Postgres that returns a 2D array now.
>
> For me - using 2D arrays is not a win. It is not a bad solution, but I
> cannot say, so I like it, because it is not a good solution. For example,
> you cannot enhance this functionality about returning searched substring.
> So you need to repeat searching. I have bad experience with using arrays in
> this style. Sometimes it is necessary, because external interfaces cannot
> work with composites, but the result is unreadable. So this is the reason
> for my opinion.
>
>
> Even if it would return arrays of a range record with "start" and "stop"
> field, I don't see how we could enhance it to later return searched
> substring without changing the return type? Doing so would break any code
> using the function anyway.
>

you can have composite (position int, value text) or (position int,
offset_bytes int, size_char int, size_bytes int), ... just there are more
possibilities

> Repeating searching if you want something else than positions, seems like
> the most SQL-idiomatic solution.
>

:)

yes, but usually you can use index on bigger data. Substring searching is
slower, and we use mostly UTF8, so if start is not in bytes, then you have
to iterate from start of string to find start of substring.

> /Joel
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-03-09 09:19:47 RE: POC: Cleaning up orphaned files using undo logs
Previous Message Dilip Kumar 2021-03-09 09:14:41 Re: [HACKERS] Custom compression methods