Quick Links

Re: UTF8MatchText

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: UTF8MatchText
Date:	2007-05-17 16:56:55
Message-ID:	464C8957.6070204@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

I wrote:
>
>
> ISTM we should generate all these match functions from one body of
> code plus some #define magic.
>
> As I understand it, we have three possible encoding switches: Single
> Byte, UTF8 and other Multi Byte Charsets, and two possible case
> settings: case Sensitive and Case Insensitive. That would make for a
> total of six functions, but in the case of both UTF8 and other MBCS we
> don't need a special Case Insensitive function - instead we downcase
> both the string and the pattern and then use the Case Sensitive
> function. That leaves a total of four functions.
>
> What is not clear to me is why the UTF8 optimisation work, and why it
> doesn't apply to other MBCS. At the very least we need a comment on that.
>
> I also find the existing function naming convention somewhat annoying
> - having foo() and MB_foo() is less than clear. I'd rather have
> SB_foo() and MB_foo(). That's not your fault, of course.
>
> If you supply me with some explanation on the UTF8 optimisation issue,
> I'll prepare a revised patch along these lines.
>
>

Ok, I have studied some more and I think I understand what's going on.
AIUI, we are switching from some expensive char-wise comparisons to
cheap byte-wise comparisons in the UTF8 case because we know that in
UTF8 the magic characters ('_', '%' and '\') aren't a part of any other
character sequence. Is that putting it too mildly? Do we need stronger
conditions than that? If it's correct, are there other MBCS for which
this is true?

cheers

andrew

In response to

Re: UTF8MatchText at 2007-05-17 14:20:50 from Andrew Dunstan

Responses

Re: UTF8MatchText at 2007-05-17 17:33:08 from Tom Lane
Re: UTF8MatchText at 2007-05-17 17:39:41 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2007-05-17 17:03:54	Re: mb and ecpg regression tests
Previous Message	Tom Lane	2007-05-17 16:40:30	Re: mb and ecpg regression tests

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2007-05-17 17:02:44	Re: Diagnostic functions
Previous Message	Heikki Linnakangas	2007-05-17 16:32:05	Seq scans status update