Skip site navigation (1) Skip section navigation (2)

Re: strpos() && KMP

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Ajtkulov <ajtkulov(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: strpos() && KMP
Date: 2007-08-09 02:24:20
Message-ID: 12142.1186626260@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-patches
Pavel Ajtkulov <ajtkulov(at)acm(dot)org> writes:
> Tom Lane writes:
>> The difficulty with B-M is the need for a table indexed by character
>> code, which at first glance looks impractical for wchars.  But it seems
>> to me that we could use "wchar % 256" as the table index, meaning that
>> wchars with the same trailing byte share the same table entry.

> hash table?

I'd think the cost of hashing would render it impractical.  Most of the
papers I've seen on this topic worry about getting single instructions
out of the search loop --- a hash lookup will cost lots more than that.
Moreover, you'd lose the guarantee of not-worse-than-linear time,
because hash lookup can be pathologically bad if you get a lot of hash
collisions.

> The main difficulty with BM is verification and understanding "good
> suffix shift" (the second part of BM) (I don't understand it entirely).

Yeah, there seem to be a bunch of variants of BM (many of them not
guaranteed linear, which I'm sure we don't want) and the earliest
papers had bugs.  But a good implementation would be a lot easier
sell because it would show benefits for a much wider set of use-cases
than KMP.

			regards, tom lane

In response to

Responses

pgsql-patches by date

Next:From: Tom LaneDate: 2007-08-09 02:35:24
Subject: Re: Default location for docbook stylesheets in Gentoo
Previous:From: Brendan JurdDate: 2007-08-08 19:02:57
Subject: Re: Default location for docbook stylesheets in Gentoo

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group