Re: patch adding new regexp functions

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 08:23:17
Message-ID: Pine.BSO.4.64.0702170005560.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sat, 17 Feb 2007, Peter Eisentraut wrote:

> Jeremy Drake wrote:
> > In case you haven't noticed, I am rather averse to making this return
> > text[] because it is much easier in my experience to use the results
> > when returned in SETOF rather than text[],
>
> The primary use case I know for string splitting is parsing
> comma/pipe/whatever separated fields into a row structure, and the way
> I see it your API proposal makes that exceptionally difficult.

For this case see string_to_array:
http://developer.postgresql.org/pgdocs/postgres/functions-array.html
select string_to_array('a|b|c', '|');
string_to_array
-----------------
{a,b,c}
(1 row)

> I don't know what your use case is, though. All of this is missing
> actual use cases.

The particular use case I had for this function was at a previous
employer, and I am not sure exactly how much detail is appropriate to
divulge. Basically, the project was doing some text processing inside of
postgres, and getting all of the words from a string into a table with
some processing (excluding stopwords and so forth) as efficiently as
possible was a big concern.

The regexp_split function code was based on some code that a friend of
mine wrote which used PCRE rather than postgres' internal regexp support.
I don't know exactly what his use-case was, but he probably had
one because he wrote the function and had it returning SETOF text ;)
Perhaps he can share a general idea of what it was (nudge nudge)?

> > While, if you
> > really really wanted a text[], you could use the (fully documented)
> > ARRAY(select resultstr from regexp_split(...) order by startpos)
> > construct.
>
> I think, however, that we should be providing simple primitives that can
> be combined into complex expressions rather than complex primitives
> that have to be dissected apart to get simple results.

The most simple primitive is string_to_array(text, text) returns text[],
but it was not sufficient for our needs.

> > > As for the regexp_matches() function, it seems to me that it
> > > returns too much information at once. What is the use case for
> > > getting all of prematch, fullmatch, matches, and postmatch in one
> > > call?
> >
> > It was requested by David Fetter:
> > http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php
> >
> > It was not horribly difficult to provide, and it seemed reasonable to
> > me. I have no need for them personally.
>
> David Fetter has also repeated failed to offer a use case for this, so I
> hesitate to accept this.

I have no strong opinion either way, so I will let those who do argue it
out and wait for the dust to settle ;)

--
The Law, in its majestic equality, forbids the rich, as well as the
poor, to sleep under the bridges, to beg in the streets, and to steal
bread.
-- Anatole France

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2007-02-17 08:50:41 Re: RFC: Temporal Extensions for PostgreSQL
Previous Message David Fetter 2007-02-17 08:16:06 Re: patch adding new regexp functions

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2007-02-17 09:20:08 Re: patch adding new regexp functions
Previous Message David Fetter 2007-02-17 08:16:06 Re: patch adding new regexp functions