Quick Links

Re: UTF8MatchText

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	db(at)zigo(dot)dhs(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: UTF8MatchText
Date:	2007-05-21 13:34:23
Message-ID:	46519FDF.5070302@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

db(at)zigo(dot)dhs(dot)org wrote:
>> Doh, you're right ... but on third thought, what happens with a pattern
>> containing "%_"? If % tries to advance bytewise then we'll be trying to
>> apply NextChar in the middle of a data character, and bad things ensue.
>>
>
> Right, when you have '_' after a '%' you need to make sure the '%'
> advances full characters. In my suggestion the test if '_' (or '\') come
> after the '%' is done once and it select which of the two loops to use,
> the one that do byte stepping or the one with NextChar.
>
> It's difficult to know for sure that we have thought about all the corner
> cases. I hope the gain is worth the effort.. :-)
>
>
>

Yes, I came to the same conclusion about how to restructure the code.

The current code contains this:

while (tlen > 0)
{
/*
* Optimization to prevent most recursion: don't recurse
* unless first pattern char might match this text char.
*/
if (CHAREQ(t, p) || (*p == '\\') || (*p == '_'))
{
int matched = MatchText(t, tlen, p, plen);

if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
}

NextChar(t, tlen);
}

The code appears to date from v 1.23 of like.c way back in 2001. I'm not
sure I agree with the comment, though. In the first place, the invariant
tests should not be in the loop, I think, and I'll hoist them out as
Dennis suggests. But why are we doing that CHAREQ? If it succeeds we'll
just do it again when we recurse, I think.

cheers

andrew

In response to

Re: UTF8MatchText at 2007-05-21 05:58:04 from db

Responses

Re: UTF8MatchText at 2007-05-21 13:43:47 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-21 13:43:47	Re: UTF8MatchText
Previous Message	Richard Huxton	2007-05-21 13:26:27	Re: Role members

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2007-05-21 13:43:47	Re: UTF8MatchText
Previous Message	Gregory Stark	2007-05-21 13:21:50	Re: Concurrent psql patch