Re: Notes about behaviour of SIMILAR TO operator

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adam Buraczewski <adamb(at)nor(dot)pl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Notes about behaviour of SIMILAR TO operator
Date: 2003-11-21 16:05:16
Message-ID: 12635.1069430716@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Adam Buraczewski <adamb(at)nor(dot)pl> writes:
> ('a' similar to 'a' escape null) is true (should be unknown!)

Yeah, you are right; this is because we are overloading a "null" second
parameter to mean "the ESCAPE part wasn't present", which in hindsight
wasn't such a hot idea.

> I think that either PostgreSQL should check for nulls in SIMILAR TO
> construct before calling similar_escape(), or there should be two
> versions of similar_escape() function: one getting only one argument
> (for SIMILAR TO without ESCAPE) and second, getting two arguments
> (a pattern and an escape char). Which solution is better?

I think the latter is the only reasonable solution, but it will be
something that we cannot implement in the 7.4.* series, because adding
another function implies initdb. I'd suggest submitting one patch that
fixes everything but the NULL problem, which we could back-patch into
7.4, and then a second patch that splits the function into two for 7.5.

>> As near as I can tell, the SQL spec requires special characters to be
>> escaped when they are inside a bracket construct. So indeed the above
>> are invalid SQL regexes.

> How the function should behave when such an invalid pattern is passed
> as its argument? Should it throw an error (this is what SQL spec
> says) or tolerate as much mistakes as possible, generating some
> warnings only?

I don't have a strong opinion --- could go with either behavior. You
might want to take it up on the pgsql-sql list.

>> Good point. Actually, do we want to force ARE mode, or something simpler?
>> Perhaps ERE or even BRE would be a better match to the SQL spec.

> I think that there is no difference which regexp dialect is choosen,
> only the speed matters. Function translating SIMILAR TO patterns into
> POSIX regular expressions will be more or less the same. What should
> I choose then?

I doubt there would be any speed difference. The advantage of a dumber
RE flavor is that it would have fewer "extra" features that might be
unintentionally triggered by a translated pattern, leading to just the
sort of non-SQL-compliant behavior you are complaining of ...

> BTW, should I write some regression tests for SIMILAR TO?

Sure. Look at some of the existing regression tests for examples.

> Should the changes be written for CVS HEAD only or 7.4/7.3 branches
> either?

I don't see that we'd bother applying it to 7.3, but 7.4 branch yes,
if you avoid any changes in the function's API for the 7.4 version.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message David Fetter 2003-11-21 22:02:34 PERFORM bug with FOUND?
Previous Message Adam Buraczewski 2003-11-21 15:44:51 Re: Notes about behaviour of SIMILAR TO operator