From: | Andrew - Supernews <andrew+nonews(at)supernews(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: LIKE optimization in UTF-8 and locale-C |
Date: | 2007-03-23 06:00:20 |
Message-ID: | slrnf06r7k.7me.andrew+nonews@atlantis.supernews.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On 2007-03-22, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> I found LIKE operators are slower on multi-byte encoding databases
>> than single-byte encoding ones. It comes from difference between
>> MatchText() and MBMatchText().
>
>> We've had an optimization for single-byte encodings using
>> pg_database_encoding_max_length() == 1 test. I'll propose to extend it
>> in UTF-8 with locale-C case.
>
> If this works for UTF8, won't it work for all the backend-legal
> encodings?
It works for UTF8 only because UTF8 has special properties which are not
shared by, for example, EUC_*. Specifically, in UTF8 the octet sequence
for a multibyte character will never appear as a subsequence of the octet
sequence of a string of other multibyte characters. i.e. given a string
of two two-octet characters AB, the second octet of A plus the first octet
of B is not a valid UTF8 character (and likewise for longer characters).
(And while I haven't tested it, it looks like the patch posted doesn't
account properly for the use of _, so it needs a bit more work.)
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew - Supernews | 2007-03-23 06:10:39 | Re: LIKE optimization in UTF-8 and locale-C |
Previous Message | ITAGAKI Takahiro | 2007-03-23 05:45:47 | Re: LIKE optimization in UTF-8 and locale-C |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew - Supernews | 2007-03-23 06:10:39 | Re: LIKE optimization in UTF-8 and locale-C |
Previous Message | ITAGAKI Takahiro | 2007-03-23 05:45:47 | Re: LIKE optimization in UTF-8 and locale-C |