Re: [PATCHES] LIKE indexing

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] LIKE indexing
Date: 2001-08-20 17:04:56
Message-ID: Pine.LNX.4.30.0108201832370.822-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane writes:

> How can A = B not imply A LIKE B?

Well, according to my reading of the spec, it apparently can. Space
padding can be weird that way. But see below why I think there are much
worse alternatives.

> 4) If the i-th substring specifier of PCV is neither an
> arbitrary character specifier nor an arbitrary string
> specifier, then the i-th substring of MCV is equal to
> that substring specifier according to the collating
> sequence of the <like predicate>, without the appending
> of <space> characters to MCV, and has the same length as
> that substring specifier.
>
> The bit about "without the appending of <space> characters" *might*
> mean that LIKE is always supposed to treat trailing blanks as
> significant, but I'm not sure.

That's how I read it.

> The text does seem to say that it's okay to add trailing blanks to the
> pattern to produce a match, when the collating sequence is PAD SPACE
> type (bpchar in our terms).

I can't find that.

> In any case, Hiroshi is dead right that LIKE is supposed to perform
> collating-sequence-dependent comparison,

As I have answered to Hiroshi, I think that would really be brain-dead.
It would alienate LIKE from how pattern matching normally operates. If we
make the assumption that strcoll(A, B) can be 0 for wildly different
values of A and B (for an appropriate definition of "different"), then the
following things could happen:

-> A = B does not imply A ~ B

-> A LIKE 'foobar%' does not imply A LIKE 'foo%' (because 'foobar' is a
single collating element that sorts like 'xyz').

-> A LIKE '%foo%' does not imply that POSITION('foo' IN A) <> 0 (The SQL
POSITION function does not mention using the collating sequence.)

I'm also quite suspicious about the wording "...and has the same length as
that substring specifier." For instance, it might be nearly reasonable to
define a German locale where ü (u umlaut) and ue are equivalent. But then
while 'xüy' = 'xuey' (a strict interpretation of the SQL standard might
deny this because of the padding, but "The result of the comparison of X
and Y is given by the collating sequence CS.", and I define mine that
way), but 'xüy' NOT LIKE 'xuey' because of that rule. Voilà, it can
happen after all.

I think this rule is a mistake designed by committee and must be struck
down by community. ;-)

> and this probably means that this whole approach is a dead end :-(

Blech... ;-)

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-08-20 17:11:47 Re: Using textin/textout vs. scribbling around
Previous Message Justin Clift 2001-08-20 16:50:11 Idea: Worthwhile creating a wrapper script to automate pg_hba.conf entries?

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2001-08-20 18:32:08 Re: New version of XML support routines (0.2)
Previous Message Garrett Wollman 2001-08-20 16:46:29 Re: Patch: use SCM_CREDS authentication over PF_LOCAL sockets