UTF8MatchText

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Andrew - Supernews" <andrew(at)supernews(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject: UTF8MatchText
Date: 2007-04-02 04:56:04
Message-ID: 20070402133445.DDF8.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Andrew - Supernews" <andrew(at)supernews(dot)net> wrote:

> ITAGAKI> I think all "safe ASCII-supersets" encodings are comparable
> ITAGAKI> by bytes, not only UTF-8.
>
> This is false, particularly for EUC.

Umm, I see. I updated the optimization to be used only for UTF8 case.
I also added some inlining hints that are useful on my machine (Pentium 4).

x1000 of LIKE '%foo% on 10000 rows tables [ms]
encoding | HEAD | P1 | P2 | P3
-----------+-------+-------+-------+-------
SQL_ASCII | 7094 | 7120 | 7063 | 7031
LATIN1 | 7083 | 7130 | 7057 | 7031
UTF8 | 17974 | 10859 | 10839 | 9682
EUC_JP | 17032 | 17557 | 17599 | 15240

- P1: UTF8MatchText()
- P2: P1 + __inline__ GenericMatchText()
- P3: P2 + __inline__ wchareq()
(The attached patch is P3.)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
utf8matchtext.patch application/octet-stream 17.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-04-02 05:08:01 Re: Bug in UTF8-Validation Code?
Previous Message Tatsuo Ishii 2007-04-02 04:49:58 Re: Bug in UTF8-Validation Code?

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-04-02 08:27:18 Re: Current enums patch
Previous Message Tom Lane 2007-04-02 04:11:08 Re: Macros for typtype (was Re: Arrays of Complex Types)