Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-26 03:51:45
Message-ID: AANLkTilIkP5cqXLiVahSCOYuyU6gMu8Bdcany2yHtzOH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 24, 2010 at 9:16 AM, Daniele Varrazzo
<daniele(dot)varrazzo(at)gmail(dot)com> wrote:
> regexp_matches() has been recently discussed
> (http://archives.postgresql.org/pgsql-bugs/2010-04/msg00026.php): it is a
> setof function and as such it can drop results.
>
> Unfortunately it is an useful function to newcomers who use SQL, use regexps
> but don't arrive to read the chapter 34.4.7. (i.e. Extending SQL -> Query
> Language (SQL) Functions -> SQL Functions Returning Sets) and are not so
> enlightened to know that "setof text[]" means "if it doesn't match, it drops
> the record". They just expect the function to be a LIKE on steroids.
>
> Please describe the behavior in the documentation of the function (i.e.
> table 9-6. and section 9.7.3), possibly provide a function with a saner
> interface, i.e. returning a text[] of the first match or NULL on no match,
> or document a workaround (suitable for an user knowing regexps but not
> setof-returning functions) to make the function not dropping record (e.g. I
> fixed the "bug" adding a "|" at the end of the pattern, so that the function
> returns an array of NULL in case of no match: I don't think it is a trivial
> workaround).

I'm not sure that it's very productive to refer to the behavior of our
code as insane. We do document this in section 9.7.3, pretty
explicitly:

"The regexp_matches function returns all of the captured substrings
resulting from matching a POSIX regular expression pattern. It has the
syntax regexp_matches(string, pattern [, flags ]). If there is no
match to the pattern, the function returns no rows. If there is a
match, the function returns a text array whose n'th element is the
substring matching the n'th parenthesized subexpression of the pattern
(not counting "non-capturing" parentheses; see below for details)."

I think that's pretty clear. Your mileage may vary, of course.

I'm less confident than what we have in table 9-6 (other string
functions, in section 9.4, string functions and operators) is clear on
first reading, but neither do I immediately know how to improve it.
Perhaps instead of critiquing our insanity you could provide some
specific suggestions for improvement.

Similarly, if you think we should have another function besides
regexp_matches(), rather than just complaining that we don't, it would
be more useful to suggest a name and a specific behavior and ideally
maybe even provide a patch (or just the docs portion of a patch) -
especially if you can point to a specific use-case that is hard to do
with the SRF but would be easier with a function with a different
interface.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Mark Kirkwood 2010-05-26 04:14:02 Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Previous Message Robert Haas 2010-05-26 03:36:18 Re: Ola