Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Date: 2008-09-05 17:02:54
Message-ID: 2F2A1443F6064162BECA6BB37625222E@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas wrote:
> After reading the wikipedia article on Boyer-Moore search algorithm, it
> looks to me like this patch actually implements the simpler
> Boyer-Moore-Horspool algorithm that only uses one lookup table. That's
> probably fine, as it ought to be faster on small needles and haystacks
> because it requires less effort to build the lookup tables, even though
> the worst-case performance is worse. It should still be faster than what
> we have now.

Yes, correct, I really didn't want to slow down smaller searches. This
method seemed to fit the bill better. What I didn't realise is that this
method also had a name.

Tom Lane wrote:
> Hmm. B-M-H has worst case search speed O(M*N) (where M = length of
> pattern, N = length of search string); whereas full B-M is O(N). Maybe we
> should build the second table when M is large?

I'll look into this. If it pays off for longer searches I'll submit a patch.
I won't have the time until after the 15th, so perhaps that's in November's
commit fest?

> The skip table really should be constructed only once in
> text_position_start and stored in TextPositionState. That would make a
> big difference to the performance of those functions that call
> text_position_next repeatedly: replace_text, split_text and text_to_array.

Of course you are right. That will help for replace and the like. I'll
update the patch tonight.

Thanks both for the feedback.

David.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Chernow 2008-09-05 17:09:40 CVS head has broken make
Previous Message Tom Lane 2008-09-05 17:02:17 Re: 8.4devel out of memory

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-09-05 17:12:17 Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Previous Message Alex Hunsaker 2008-09-05 16:12:01 Re: hash index improving v3