Re: like/ilike improvements

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: like/ilike improvements
Date: 2007-05-25 03:21:35
Message-ID: 4656563F.50608@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> You have to be on a first byte before you can meaningfully apply
>>> NextChar, and you have to use NextChar or else you don't count
>>> characters correctly (eg "__" must match 2 chars not 2 bytes).
>>>
>
>
>> Yes, I agree completely. However it looks to me like IsFirstByte will in
>> fact always be true when we get to call NextChar for matching "_" for UTF8.
>>
>
> If that's true, the patch is failing to achieve its goal of treating %
> bytewise ...
>

Let's back up. % processing works by looking for a place in the text
that might match what follows % in the pattern, and then calling itself
recursively. For UTF8, if what follows % is _, it does that search by
repeatedly calling NextChar - otherwise it calls NextByte. But if we're
not processing a wildcard we have to match an actual complete UTF8 char,
so the fact that we proceed byte-wise won't get us out of sync. whenever
we happen to encounter an _. We can't rely on that process for other
multi-byte charsets because the suffix of one char might be the prefix
of another, so we could get false matches. That can't happen with UTF8.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-05-25 03:34:13 Re: like/ilike improvements
Previous Message Tom Lane 2007-05-25 03:20:51 Re: like/ilike improvements

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2007-05-25 03:34:13 Re: like/ilike improvements
Previous Message Tom Lane 2007-05-25 03:20:51 Re: like/ilike improvements