From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: like/ilike improvements |
Date: | 2007-05-25 03:21:35 |
Message-ID: | 4656563F.50608@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> You have to be on a first byte before you can meaningfully apply
>>> NextChar, and you have to use NextChar or else you don't count
>>> characters correctly (eg "__" must match 2 chars not 2 bytes).
>>>
>
>
>> Yes, I agree completely. However it looks to me like IsFirstByte will in
>> fact always be true when we get to call NextChar for matching "_" for UTF8.
>>
>
> If that's true, the patch is failing to achieve its goal of treating %
> bytewise ...
>
Let's back up. % processing works by looking for a place in the text
that might match what follows % in the pattern, and then calling itself
recursively. For UTF8, if what follows % is _, it does that search by
repeatedly calling NextChar - otherwise it calls NextByte. But if we're
not processing a wildcard we have to match an actual complete UTF8 char,
so the fact that we proceed byte-wise won't get us out of sync. whenever
we happen to encounter an _. We can't rely on that process for other
multi-byte charsets because the suffix of one char might be the prefix
of another, so we could get false matches. That can't happen with UTF8.
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-05-25 03:34:13 | Re: like/ilike improvements |
Previous Message | Tom Lane | 2007-05-25 03:20:51 | Re: like/ilike improvements |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-05-25 03:34:13 | Re: like/ilike improvements |
Previous Message | Tom Lane | 2007-05-25 03:20:51 | Re: like/ilike improvements |