Quick Links

Re: UTF8MatchText

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: UTF8MatchText
Date:	2007-05-17 14:20:50
Message-ID:	464C64C2.6000804@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Itagaki,

I find this still fairly unclean. It certainly took me some time to get
me head around what's going on.

ISTM we should generate all these match functions from one body of code
plus some #define magic.

As I understand it, we have three possible encoding switches: Single
Byte, UTF8 and other Multi Byte Charsets, and two possible case
settings: case Sensitive and Case Insensitive. That would make for a
total of six functions, but in the case of both UTF8 and other MBCS we
don't need a special Case Insensitive function - instead we downcase
both the string and the pattern and then use the Case Sensitive
function. That leaves a total of four functions.

What is not clear to me is why the UTF8 optimisation work, and why it
doesn't apply to other MBCS. At the very least we need a comment on that.

I also find the existing function naming convention somewhat annoying -
having foo() and MB_foo() is less than clear. I'd rather have SB_foo()
and MB_foo(). That's not your fault, of course.

If you supply me with some explanation on the UTF8 optimisation issue,
I'll prepare a revised patch along these lines.

cheers

andrew

ITAGAKI Takahiro wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>
>>> I do not understand this patch. You have defined two functions,
>>> UTF8MatchText() and UTF8MatchTextIC(), and the difference between them
>>> is that one calls CHAREQ and the other calls ICHAREQ, but just above
>>> those two functions you define the macros identically:
>>>
>> Why are there two functions? Also, can't you use one function and just
>> pass a boolean to indicate whether case should be ignored?
>>
>
> The same is true of MBMatchText() and MBMatchTextIC().
> Now, I'll split the patch into two changes.
>
> 1. DropMBMatchTextIC.patch
> Drop MBMatchTextIC() and use MBMatchText() instead.
>
> 2. UTF8MatchText.patch
> Add UTF8MatchText() as a specialized version of MBMatchText().
>
>
> As a future work, it might be good to research the performance of rewriting
> "col ILIKE 'pattern'" to "lower(col) LIKE lower('pattern')" in planner so that
> we can avoid to call lower() for constant pattern in the right-hand side and
> can use functional indexes (lower(col)). I think we never need MBMatchTextIC()
> in the future unless we move to wide-character server encoding :)
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

In response to

Re: UTF8MatchText at 2007-04-09 02:48:44 from ITAGAKI Takahiro

Responses

Re: UTF8MatchText at 2007-05-17 16:56:55 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-17 14:31:59	Re: CREATE TABLE LIKE INCLUDING INDEXES support
Previous Message	Greg Smith	2007-05-17 13:54:36	Re: Not ready for 8.3

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-17 14:31:59	Re: CREATE TABLE LIKE INCLUDING INDEXES support
Previous Message	Tom Lane	2007-05-17 14:16:06	Re: updated SORT/LIMIT patch