Re: BUG #4044: Incorrect RegExp substring Output

From: "Rui Martins" <Rui(dot)Martins(at)PDMFC(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4044: Incorrect RegExp substring Output
Date: 2008-03-19 11:42:10
Message-ID: 1580.B1UHWUVdEF8=.1205926930.squirrel@www.pdmfc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Tom

Just a side note, See comments below.

> "Rui Martins" <Rui(dot)Martins(at)pdmfc(dot)com> writes:
>> Description: Incorrect RegExp substring Output
>
>> SUBSTRING( BedNo FROM '^[[:digit:]]+[a-zA-Z]*(:[[:digit:]]+)?$' )
>
> Interesting. It had never occurred to me that it's possible for the
> whole pattern to have a match when some parenthesized subexpression
> has no match. On investigation, Tcl's regex library seems to get
> this right, but textregexsubstr() doesn't. Will fix.
>
>> I would expect the result for BedNumber to be either NULL or the EMPTY
>> String, and the later seems more logical.
>
> It's going to be null. Your example has no match to the parenthesized
> substring --- a match would have to include a colon and some digits, no?

You mention that it will return NULL, when the subexpression does not match!

Here the context of the word "match" may be misleading us, in this
conversation.
I say this, because in my report, the second substring expression, the one
for RoomSize:

SUBSTRING( BedNo, '^[[:digit:]]+([a-zA-Z]*)(:[[:digit:]]+)?$' ) AS RoomSize,

Actually returns an EMPTY String, and not a NULL, for the first 2 test
cases, which I believe is the correct answer.

From what I can infer, from your definition, of "match", from your last
sentence, this should NOT be a MATCH for the Subexpression, since it would
be an EMPTY match.
However, it's returning EMPTY String instead of NULL. i.e. returns what I
expect and not what you have said it should return in case of NO MATCH.

I usually think that a "match" is something that validates as correct, and
hence returns something. But I have to admit that I usually think about a
global match, i.e. the entire expression match, and not about
sub-expression match.

Even though this can me though as argumentative, think about this expression:

(something)?

Will "match" with an empty string in the context of a full expression, and
will return an EMPTY String. So by analogy, I would expect it, to return
the same as a sub-expression when it actually has a "match" even if with
an empty sub-string.

My expectations and assumptions might be wrong, but I believe they are
correct. Please check this too.

Once again, thank you for your quick feedback.

Best regards
Rui Martins

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2008-03-19 14:43:59 Re: BUG #4044: Incorrect RegExp substring Output
Previous Message Rui Martins 2008-03-19 11:09:15 Re: BUG #4044: Incorrect RegExp substring Output