Skip site navigation (1) Skip section navigation (2)

Re: UTF8MatchText

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: UTF8MatchText
Date: 2007-05-20 16:58:05
Message-ID: 2132.1179680285@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Are you sure? The big remaining char-matching bottleneck will surely 
> be in the code that scans for a place to start matching a %. But 
> that's exactly where we can't use byte matching for cases where the 
> charset might include AB and BA as characters - the pattern might 
> contain %BA and the string AB. However, this isn't a danger for UTF8, 
> which leads me to think that we do indeed need a special case for 
> UTF8, but for a different improvement from that proposed in the 
> original patch. I'll post an updated patch shortly.

> Here is a patch that implements this. Please analyse for possible 
> breakage.

On the strength of this analysis, shouldn't we drop the separate
UTF8 match function and just use SB_MatchText for UTF8?

It strikes me that we may be overcomplicating matters in another way
too.  If you believe that the %-scan code is now the bottleneck, that
is, the key loop is where we have pattern '%foo' and we are trying to
match 'f' to each successive data position, then you should be bothered
that SB_MatchTextIC is applying tolower() to 'f' again for each data
character.  Worst-case we could have O(N^2) applications of tolower()
during a match.  I think there's a fair case to be made that we should
get rid of SB_MatchTextIC and implement *all* the case-insensitive
variants by means of an initial lower() call.  This would leave us with
just two match functions and allow considerable unification of the setup
logic.

			regards, tom lane

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2007-05-20 17:07:39
Subject: Re: Passing more context info to selectivity-estimation code
Previous:From: Andrew DunstanDate: 2007-05-20 16:45:37
Subject: Re: Concurrent psql patch

pgsql-patches by date

Next:From: Nikolay SamokhvalovDate: 2007-05-20 18:36:45
Subject: Re: [PATCHES] build/install xml2 when configured with libxml
Previous:From: Andrew DunstanDate: 2007-05-20 16:45:37
Subject: Re: Concurrent psql patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group