Re: UTF8MatchText

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: UTF8MatchText
Date: 2007-05-20 16:58:05
Message-ID: 2132.1179680285@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Are you sure? The big remaining char-matching bottleneck will surely
> be in the code that scans for a place to start matching a %. But
> that's exactly where we can't use byte matching for cases where the
> charset might include AB and BA as characters - the pattern might
> contain %BA and the string AB. However, this isn't a danger for UTF8,
> which leads me to think that we do indeed need a special case for
> UTF8, but for a different improvement from that proposed in the
> original patch. I'll post an updated patch shortly.

> Here is a patch that implements this. Please analyse for possible
> breakage.

On the strength of this analysis, shouldn't we drop the separate
UTF8 match function and just use SB_MatchText for UTF8?

It strikes me that we may be overcomplicating matters in another way
too. If you believe that the %-scan code is now the bottleneck, that
is, the key loop is where we have pattern '%foo' and we are trying to
match 'f' to each successive data position, then you should be bothered
that SB_MatchTextIC is applying tolower() to 'f' again for each data
character. Worst-case we could have O(N^2) applications of tolower()
during a match. I think there's a fair case to be made that we should
get rid of SB_MatchTextIC and implement *all* the case-insensitive
variants by means of an initial lower() call. This would leave us with
just two match functions and allow considerable unification of the setup
logic.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-05-20 17:07:39 Re: Passing more context info to selectivity-estimation code
Previous Message Andrew Dunstan 2007-05-20 16:45:37 Re: Concurrent psql patch

Browse pgsql-patches by date

  From Date Subject
Next Message Nikolay Samokhvalov 2007-05-20 18:36:45 Re: [PATCHES] build/install xml2 when configured with libxml
Previous Message Andrew Dunstan 2007-05-20 16:45:37 Re: Concurrent psql patch