Skip site navigation (1) Skip section navigation (2)

UTF8MatchText

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Andrew - Supernews" <andrew(at)supernews(dot)net>, pgsql-patches(at)postgresql(dot)org
Subject: UTF8MatchText
Date: 2007-04-02 04:56:04
Message-ID: 20070402133445.DDF8.ITAGAKI.TAKAHIRO@oss.ntt.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
"Andrew - Supernews" <andrew(at)supernews(dot)net> wrote:

>  ITAGAKI> I think all "safe ASCII-supersets" encodings are comparable
>  ITAGAKI> by bytes, not only UTF-8.
> 
> This is false, particularly for EUC.

Umm, I see. I updated the optimization to be used only for UTF8 case.
I also added some inlining hints that are useful on my machine (Pentium 4).


x1000 of LIKE '%foo% on 10000 rows tables [ms]
 encoding  | HEAD  |  P1   |  P2   |  P3  
-----------+-------+-------+-------+-------
 SQL_ASCII |  7094 |  7120 |  7063 |  7031
 LATIN1    |  7083 |  7130 |  7057 |  7031
 UTF8      | 17974 | 10859 | 10839 |  9682
 EUC_JP    | 17032 | 17557 | 17599 | 15240

- P1: UTF8MatchText()
- P2: P1 + __inline__ GenericMatchText()
- P3: P2 + __inline__ wchareq()
      (The attached patch is P3.)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


Attachment: utf8matchtext.patch
Description: application/octet-stream (17.4 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-04-02 05:08:01
Subject: Re: Bug in UTF8-Validation Code?
Previous:From: Tatsuo IshiiDate: 2007-04-02 04:49:58
Subject: Re: Bug in UTF8-Validation Code?

pgsql-patches by date

Next:From: Heikki LinnakangasDate: 2007-04-02 08:27:18
Subject: Re: Current enums patch
Previous:From: Tom LaneDate: 2007-04-02 04:11:08
Subject: Re: Macros for typtype (was Re: Arrays of Complex Types)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group