Re: [HACKERS] writing new regexp functions

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-02 03:29:35
Message-ID: Pine.BSO.4.64.0702011914480.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, 1 Feb 2007, Jeremy Drake wrote:

> On Thu, 1 Feb 2007, Tom Lane wrote:
>
> > Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > > Is there some specific reason that these functions are static,
> >
> > Yeah: not cluttering the global namespace.
>
> > Is there a reason for not putting your new code itself into regexp.c?
>
> Not really, I just figured it would be cleaner/easier to write it as an
> extension. I also figure that it is unlikely that every regexp function
> that anyone could possibly want will be implemented in core in that one
> file.
<snip>

> Anyway, the particular thing I was writing was a function like
> substring(str FROM pattern) which instead of returning just the first
> match group, would return an array of text containing all of the match
> groups. I exported the functions in my sandbox, and wrote a module with a
> function that does this.

I have attached the patch I have put together, which does the following:
* Expose the previously static RE_* functions from regexp.c which wrap
the code in src/backend/regex with postgres-style errors, string
conversion, and caching of patterns.

* expose regex_flavor guc var, which is needed to know how to interpret
patterns when compiling them

* Add a couple more RE_* functions in regexp.c to provide access
to different levels of the process, which were necessary to avoid
duplicating effort elsewhere.

* Update replace_text_regexp in varlena.c to use newly exposed functions
from regexp.c instead of duplicating error handling code from there.

Also attached is the function I wrote to retrieve all of the capture
groups in a pattern match in a text[]. I also intend to put together a
function analogous to split_part which will take a string and a pattern to
split on, and return setof text.

Let me know if I should work under the assumption of the attached patch
and write the functions for contrib or pgfoundry, or to put the functions
in regexp.c and try to get them in core, or both? (it made my life a lot
easier working on the function to not have to restart the postmaster every
time I recompiled it, may be nice for the future to be able to make
extensions like this...)

--
To err is human, to forgive, beyond the scope of the Operating System.

Attachment Content-Type Size
regexp_ext.c text/plain 1.8 KB
regexp-export.patch text/plain 9.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2007-02-02 03:46:48 Re: SQL to get a table columns comments?
Previous Message ITAGAKI Takahiro 2007-02-02 02:47:50 Re: Estimation error in n_dead_tuples

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2007-02-02 03:50:12 Re: Enums patch v2
Previous Message ITAGAKI Takahiro 2007-02-02 02:47:54 Error correction for n_dead_tuples