Quick Links

Re: UTF8MatchText

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: UTF8MatchText
Date:	2007-05-18 03:06:05
Message-ID:	464D181D.9010307@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Tom Lane wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
>
>> Yes, I only used the 'disjoint representations for first-bytes and
>> not-first-bytes of MB characters' feature in UTF8. Other encodings
>> allows both [AB] and [BA] for MB character patterns. UTF8Match() does
>> not cope with those encodings; If we have '[AB][AB]' in a table and
>> search it with LIKE '%[BA]%', we judge that they are matched by mistake.
>>
>
> AFAICS, the patch does *not* make that mistake because % will not
> advance over a fractional character.
>
>

Yeah, I think that's right.

Attached is my current WIP patch. If we decide that this optimisation
can in fact be applied to all backend encodings, that will be easily
incorporated. It will simplify the code further. Note that all the
common code in the MatchText and do_like_escape functions has been
factored - and the bytea functions just call the single-byte text
versions - AFAICS the effect will be identical to having the specialised
versions. (I'm always happy when code volume can be reduced.)

cheers

andrew

Attachment	Content-Type	Size
utf8.patch	text/x-patch	24.7 KB

In response to

Re: UTF8MatchText at 2007-05-18 02:35:06 from Tom Lane

Responses

Re: UTF8MatchText at 2007-05-18 03:31:22 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-18 03:31:22	Re: UTF8MatchText
Previous Message	Greg Smith	2007-05-18 03:02:31	Re: Not ready for 8.3

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-18 03:31:22	Re: UTF8MatchText
Previous Message	Tom Lane	2007-05-18 02:35:06	Re: UTF8MatchText