Re: BUG #4044: Incorrect RegExp substring Output

From: "Rui Martins" <Rui(dot)Martins(at)PDMFC(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4044: Incorrect RegExp substring Output
Date: 2008-03-20 15:21:49
Message-ID: 3830.B1UHWUVdEF8=.1206026509.squirrel@www.pdmfc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Tom

> "Rui Martins" <Rui(dot)Martins(at)PDMFC(dot)com> writes:
>> My reasoning is:
>> Why would the exact same sub-expression, return different results when
>> either preceded or followed by something.
>
> It *isn't* returning different results; you are testing for different
> things in these two cases, namely whether there is a match to the whole
> pattern or just a parenthesized subpattern. In none of these examples
> was there any match to '(something)' --- there couldn't possibly be,
> because "something" isn't in the data string.
>
> regards, tom lane

That's one way to look at it. That's why I mentioned the possibility of
different assumptions regarding the context of the word "match".

In fact, you are saying that the sub-expression did not "match" because
there wasn't "something" in the string to be matched!
I agree with you on this last part,
"there wasn't "something" in the string to be matched"
But the sub-expression did "match" !

I say this, because, since the empty string is a valid "match" for
"(something)?" because the "?" question mark operator, is defined as "a
sequence of 0 or 1 matches of the atom".

So we are probably just discussing semantics here!

My concern is that many will make the same refutable "valid" assumptions
that I do.

And If they will get NULL instead of an EMPTY String, it will be awkward,
besides not being able to distinguish between an EMPTY "match" and NO
"match" at all, since both will return NULL, according to your definition.

But what I find odd, is that you say that I'm testing different things. So
what would you say for the following cases ?

'(something)?'

NOTE: I removed the anchors only.

Now is this a full string match or a sub-expression match ?

We can't give a concrete answer, unless we know the concrete string to be
matched

SELECT '' ~ '(something)?'

This will be a FULL match

SELECT 'TEST' ~ '(something)?'

But this one won't! It will be a sub-expression match by your definition.
So using the EXACT same REG_EXP, we will have two different context,
depending on the input !

The regexp context, MUST NOT depend on the String to be matched.
Because if it depends, then this is VERY BAD for consistency.

Do you get my point now ?

Now try this:

SELECT SUBSTRING( '', '(something)?' )

SELECT SUBSTRING( 'TEST', '(something)?' )

Odd enough, this, currently, returns the correct answer for both queries!
And by correct I mean EMPTY String !

According to your assumption, the first, would return an Empty String, but
the second, would return NULL !

You should try this with other reg_exp implementations, and see what comes
up on the the sub-expression result.

If after this exposition I haven't been able to correctly transmit the
problem to you, then it's probably my inability to explain it better, or
my not so good English, since it's not my native language.

Hope you understand this now, since I don't know how to explain it better.

Thank you for your feedback.

Best Regards
Rui Martins

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2008-03-20 15:30:32 Re: BUG #4044: Incorrect RegExp substring Output
Previous Message Tom Lane 2008-03-20 14:06:06 Re: Problem identifying constraints which should not be inherited