Re: SQL-spec incompatibilities in similar_escape() and related stuff

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL-spec incompatibilities in similar_escape() and related stuff
Date: 2019-05-14 16:22:03
Message-ID: 21793.1557850923@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ backing up to a different sub-discussion ]

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> To support the first usage, similar_escape is non-strict, and it
> Tom> takes a NULL second argument to mean '\'. This is already a SQL
> Tom> spec violation, because as far as I can tell from the spec, if you
> Tom> don't write an ESCAPE clause then there is *no* escape character;
> Tom> there certainly is not a default of '\'. However, we document this
> Tom> behavior, so I don't know if we want to change it.

> This is the same spec violation that we also have for LIKE, which also
> is supposed to have no escape character in the absense of an explicit
> ESCAPE clause.

Right. After further thought, I propose that what we ought to do is
unify LIKE, SIMILAR TO, and 3-arg SUBSTRING on a single set of behaviors
for the ESCAPE argument:

1. They are strict, ie a NULL value for the escape string produces a
NULL result. This is per spec, and we don't document anything different,
and nobody would really expect something different. (But see below
about keeping similar_escape() as a legacy compatibility function.)

2. Omitting the ESCAPE option (not possible for SUBSTRING) results in a
default of '\'. This is not per spec, but we've long documented it this
way, and frankly I'd say that it's a far more useful default than the
spec's behavior of "there is no escape character". I propose that we
just document that this is not-per-spec and move on.

3. Interpret an empty ESCAPE string as meaning "there is no escape
character". This is not per spec either (the spec would have us
throw an error) but it's our historical behavior, and it seems like
a saner approach than the way the spec wants to do it --- in particular,
there's no way to get that behavior in 3-arg SUBSTRING if we don't allow
this.

So only point 1 represents an actual behavioral change from what we've
been doing; the other two just require doc clarifications.

Now, I don't have any problem with changing what happens when somebody
actually writes "a LIKE b ESCAPE NULL"; it seems fairly unlikely that
anyone would expect that to yield a non-null result. However, we do
have a problem with the fact that the implementation is partially
exposed:

regression=# create view v1 as select f1 similar to 'x*' from text_tbl;
CREATE VIEW
regression=# \d+ v1
...
View definition:
SELECT text_tbl.f1 ~ similar_escape('x*'::text, NULL::text)
FROM text_tbl;

If we just change similar_escape() to be strict, then this view will
stop working, which is a bit hard on users who did not write anything
non-spec-compliant.

I propose therefore that we leave similar_escape in place with its
current behavior, as a compatibility measure for cases like this.
Intead, invent two new strict functions, say
similar_to_escape(pattern)
similar_to_escape(pattern, escape)
and change the parser and the implementation of SUBSTRING() to
rely on these going forward.

The net effect will be to make explicit "ESCAPE NULL" spec-compliant,
and to get rid of the performance problem from inlining failure for
substring(). All else is just doc clarifications.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2019-05-14 17:45:21 VACUUM fails to parse 0 and 1 as boolean value
Previous Message Alvaro Herrera 2019-05-14 15:32:52 Re: Inconsistent error message wording for REINDEX CONCURRENTLY