Re: strpos() && KMP

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Ajtkulov <ajtkulov(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: strpos() && KMP
Date: 2007-08-09 02:24:20
Message-ID: 12142.1186626260@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Pavel Ajtkulov <ajtkulov(at)acm(dot)org> writes:
> Tom Lane writes:
>> The difficulty with B-M is the need for a table indexed by character
>> code, which at first glance looks impractical for wchars. But it seems
>> to me that we could use "wchar % 256" as the table index, meaning that
>> wchars with the same trailing byte share the same table entry.

> hash table?

I'd think the cost of hashing would render it impractical. Most of the
papers I've seen on this topic worry about getting single instructions
out of the search loop --- a hash lookup will cost lots more than that.
Moreover, you'd lose the guarantee of not-worse-than-linear time,
because hash lookup can be pathologically bad if you get a lot of hash
collisions.

> The main difficulty with BM is verification and understanding "good
> suffix shift" (the second part of BM) (I don't understand it entirely).

Yeah, there seem to be a bunch of variants of BM (many of them not
guaranteed linear, which I'm sure we don't want) and the earliest
papers had bugs. But a good implementation would be a lot easier
sell because it would show benefits for a much wider set of use-cases
than KMP.

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-08-09 02:35:24 Re: Default location for docbook stylesheets in Gentoo
Previous Message Brendan Jurd 2007-08-08 19:02:57 Re: Default location for docbook stylesheets in Gentoo