Pavel Ajtkulov <ajtkulov(at)acm(dot)org> writes:
> Tom Lane writes:
>> The difficulty with B-M is the need for a table indexed by character
>> code, which at first glance looks impractical for wchars. But it seems
>> to me that we could use "wchar % 256" as the table index, meaning that
>> wchars with the same trailing byte share the same table entry.
> hash table?
I'd think the cost of hashing would render it impractical. Most of the
papers I've seen on this topic worry about getting single instructions
out of the search loop --- a hash lookup will cost lots more than that.
Moreover, you'd lose the guarantee of not-worse-than-linear time,
because hash lookup can be pathologically bad if you get a lot of hash
> The main difficulty with BM is verification and understanding "good
> suffix shift" (the second part of BM) (I don't understand it entirely).
Yeah, there seem to be a bunch of variants of BM (many of them not
guaranteed linear, which I'm sure we don't want) and the earliest
papers had bugs. But a good implementation would be a lot easier
sell because it would show benefits for a much wider set of use-cases
regards, tom lane
In response to
pgsql-patches by date
|Next:||From: Tom Lane||Date: 2007-08-09 02:35:24|
|Subject: Re: Default location for docbook stylesheets in Gentoo |
|Previous:||From: Brendan Jurd||Date: 2007-08-08 19:02:57|
|Subject: Re: Default location for docbook stylesheets in Gentoo|