Re: [HACKERS] writing new regexp functions

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-04 21:00:12
Message-ID: Pine.BSO.4.64.0702041254011.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Sun, 4 Feb 2007, David Fetter wrote:

> On Fri, Feb 02, 2007 at 07:01:33PM -0800, Jeremy Drake wrote:
>
> > Let me know if you see any bugs or issues with this code, and I am
> > open to suggestions for further regression tests ;)
>
> > Things that I still want to look into:
> > * regexp flags (a la regexp_replace).
>
> One more text field at the end is how the regexp_replace() one does
> it.

That's how I did it.

> > * maybe make regexp_matches return setof whatever, if given a 'g' flag
> > return all matches in string.
>
> This is doable with current machinery, albeit a little clumsily.

I have implemented this too.

> > * maybe a join function that works as an aggregate
> > SELECT join(',', col) FROM tbl
> > currently can be written as
> > SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')
>
> The array_accum() aggregate in the docs works OK for this purpose.

I have not tackled this yet, I think it may be better to stick with the
ARRAY() construct for now.

So, here is the new version of the code, and also a new version of the
patch to core, which fixes some compile warnings that I did not see at
first because I was using ICC rather than GCC.

Here is the README.regexp_ext from the tar file:

This package contains regexp functions beyond those currently provided
in core PostgreSQL, utilizing the regexp engine built into core. This
is still a work-in-progress.

The most recent version of this code can be found at
http://www.jdrake.com/postgresql/regexp/regexp_ext.tar.gz
and the prerequisite patch to PostgreSQL core, which has been submitted
for review, can be found at
http://www.jdrake.com/postgresql/regexp/regexp-export.patch

The .tar.gz file expects to be untarred in contrib/. I have made some
regression tests that can be run using 'make installcheck' as normal for
contrib. I think they exercise the corner cases in the code, but I may
very well have missed some. It requires the above mentioned patch to
core to compile, as it takes advantage of new exported functions from
src/backend/utils/adt/regexp.c.

Let me know if you see any bugs or issues with this code, and I am open to
suggestions for further regression tests ;)

Functions implemented in this module:
* regexp_split(str text, pattern text) RETURNS SETOF text
regexp_split(str text, pattern text, flags text) RETURNS SETOF text
returns each section of the string delimited by the pattern.
* regexp_matches(str text, pattern text) RETURNS text[]
returns all capture groups when matching pattern against string in an array
* regexp_matches(str text, pattern text, flags text) RETURNS SETOF
(prematch text, fullmatch text, matches text[], postmatch text)
returns all capture groups when matching pattern against string in an array.
also returns the entire match in fullmatch. if the 'g' option is given,
returns all matches in the string. if the 'r' option is given, also return
the text before and after the match in prematch and postmatch respectively.

See the regression tests for more details about usage and return values.

Recent changes:
* I have put the pattern after the string in all of the functions, as
discussed on the pgsql-hackers mailing list.

* regexp flags (a la regexp_replace).

* make regexp_matches return setof whatever, if given a 'g' flag return
all matches in string.

Things that I still want to look into:
* maybe a join function that works as an aggregate
SELECT join(',', col) FROM tbl
currently can be written as
SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')

--
Philogeny recapitulates erogeny; erogeny recapitulates philogeny.

Attachment Content-Type Size
regexp-export.patch text/plain 9.2 KB
regexp_ext.tar.gz application/octet-stream 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-04 21:18:19 Re: [HACKERS] \copy (query) delimiter syntax error
Previous Message Tom Lane 2007-02-04 19:15:18 Re: [PATCHES] Fix "database is ready" race condition

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-02-04 21:18:19 Re: [HACKERS] \copy (query) delimiter syntax error
Previous Message Tom Lane 2007-02-04 19:15:18 Re: [PATCHES] Fix "database is ready" race condition