Skip site navigation (1) Skip section navigation (2)

Re: UTF8MatchText

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: UTF8MatchText
Date: 2007-05-20 13:28:29
Message-ID: 46504CFD.5040505@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches

Dennis Bjorklund wrote:
> Tom Lane skrev:
>> You could imagine trying to do
>> % a byte at a time (and indeed that's what I'd been thinking it did)
>> but that gets you out of sync which breaks the _ case.
>
> It is only when you have a pattern like '%_' when this is a problem 
> and we could detect this and do byte by byte when it's not. Now we 
> check (*p == '\\') || (*p == '_') in each iteration when we scan over 
> characters for '%', and we could do it once and have different loops 
> for the two cases.
>
> Other than this part that I think can be optimized I don't see 
> anything wrong with the idea behind the patch. To make the '%' case 
> fast might be an important optimization for a lot of use cases. It's 
> not uncommon that '%' matches a bigger part of the string than the 
> rest of the pattern.
>


Are you sure? The big remaining char-matching bottleneck will surely be 
in the code that scans for a place to start matching a %. But that's 
exactly where we can't use byte matching for cases where the charset 
might include AB and BA as characters - the pattern might contain %BA 
and the string AB. However, this isn't a danger for UTF8, which leads me 
to think that we do indeed need a special case for UTF8, but for a 
different improvement from that proposed in the original patch. I'll 
post an updated patch shortly.

cheers

andrew

In response to

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2007-05-20 14:11:16
Subject: Re: UTF8MatchText
Previous:From: Heikki LinnakangasDate: 2007-05-20 09:26:50
Subject: Re: Passing more context info to selectivity-estimation code

pgsql-patches by date

Next:From: Heikki LinnakangasDate: 2007-05-20 13:50:09
Subject: Re: Seq scans status update
Previous:From: Henry B. HotzDate: 2007-05-20 08:28:40
Subject: Re: Preliminary GSSAPI Patches

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group