Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Date: 2008-09-06 23:25:07
Message-ID: 9CD35557A45746D183E72D71802749D4@amd64
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> I looked this over a bit and was immediately confused by one thing:
> the introductory comment says that the skip table size ought to be based
> on the length of the haystack, which makes sense to me, but the code is
> actually initializing it on the basis of len2, ie, the length of the
> needle. Isn't that a bug? Was the same bug present in the tests you
> made to determine the best table sizes?

Yes Bug. That's what I get for making last minute changes. It didn't affect
the benchmark, that was done before moving the code into postgresql for
testing. The function I tested with had an extra parameter to set the skip
table size. Each possible size was tested and the best time was saved along
with the best size.

> BTW, to the extent that you feel like testing a different idea,
> I would suggest:

> * don't bother initializing the skiptable when len1 < len2

Good plan.

> * otherwise, choose its size based on len1 - len2, not just len1 or
> len2. This is (one less than) the maximum number of search loop
> consultations of the skip table that can happen, so it seems like a
> plausible number, and better than either length alone.

That seems like a better idea. I had considered len1 * len2, giving that's
the worst case for BMH. Of course the lengths would need to be raised quite
a bit. I'll go with len1 - len2

I'll make the above changes and then run my benchmark again to see if the
skip table size logic should be updated.

I'll also benchmark and update the benchmark spreadsheet to see what's
changed.

David.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Abbas Butt 2008-09-06 23:52:30 Re: Need more reviewers!
Previous Message Tom Lane 2008-09-06 22:50:47 Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-09-07 00:10:04 Re: [PgFoundry] Unsigned Data Types [1 of 2]
Previous Message Jaime Casanova 2008-09-06 23:22:10 Re: [PgFoundry] Unsigned Data Types [1 of 2]