Re: Notes about behaviour of SIMILAR TO operator

From: Adam Buraczewski <adamb(at)nor(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Notes about behaviour of SIMILAR TO operator
Date: 2003-11-21 15:44:51
Message-ID: 20031121154451.GB577@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hallo Tom,

I decided to improve similar_escape() function during the weekend.
Thank you very much for the excerpt from SQL standard (I think this is
much more complete than the text I found in a Working Draft from
August 1994). However, there are some more issues I'd like to make
clear before function redesigning.

First of all, I found that SQL spec says that when one of the items of
SIMILAR TO construct (<character match value>, <similar pattern> or
<escape character>) is NULL, then result of the whole construct should
be unknown. However, PostgreSQL treats SIMILAR TO ... ESCAPE NULL the
same way as when no ESCAPE clause is present, which is wrong:

('a' similar to 'a') is true
('a' similar to 'a' escape null) is true (should be unknown!)

The behaviour above is caused by the fact that escape character is
passed to similar_escape() without checking for null value, and the
same null value is passed to it when there is no ESCAPE clause at all.
I think that either PostgreSQL should check for nulls in SIMILAR TO
construct before calling similar_escape(), or there should be two
versions of similar_escape() function: one getting only one argument
(for SIMILAR TO without ESCAPE) and second, getting two arguments
(a pattern and an escape char). Which solution is better?

> As near as I can tell, the SQL spec requires special characters to be
> escaped when they are inside a bracket construct. So indeed the above
> are invalid SQL regexes.

How the function should behave when such an invalid pattern is passed
as its argument? Should it throw an error (this is what SQL spec
says) or tolerate as much mistakes as possible, generating some
warnings only?

> Good point. Actually, do we want to force ARE mode, or something simpler?
> Perhaps ERE or even BRE would be a better match to the SQL spec.

I think that there is no difference which regexp dialect is choosen,
only the speed matters. Function translating SIMILAR TO patterns into
POSIX regular expressions will be more or less the same. What should
I choose then?

BTW, should I write some regression tests for SIMILAR TO? Is there
any guide how to do it (except PostgreSQL sources)? Should the
changes be written for CVS HEAD only or 7.4/7.3 branches either?

Regards,

--
Adam Buraczewski <adamb (at) nor (dot) pl> * Linux user #165585
GCS/TW d- s-:+>+:- a C+++(++++) UL++++$ P++ L++++ E++ W+ N++ o? K w--
O M- V- PS+ !PE Y PGP+ t+ 5 X+ R tv- b+ DI D G++ e+++>++++ h r+>++ y?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2003-11-21 16:05:16 Re: Notes about behaviour of SIMILAR TO operator
Previous Message kchen 2003-11-21 08:53:06 7.4: FATAL: unrecognized configuration parameter "show_statement_stats"