Quick Links

Re: [PATCHES] TODO item: Implement Boyer-Moore searching (First time hacker)

From:	"David Rowley" <dgrowley(at)gmail(dot)com>
To:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Heikki Linnakangas'" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCHES] TODO item: Implement Boyer-Moore searching (First time hacker)
Date:	2008-09-07 02:03:57
Message-ID:	2B14F8402C8E4A3C9E240E3DAF31BA25@amd64
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

<<<Moved from pgsql-patches>>>

Tom Lane wrote:
> I wrote:
> > I looked this over a bit and was immediately confused by one thing:
> > the introductory comment says that the skip table size ought to be based
> > on the length of the haystack, which makes sense to me, but the code is
> > actually initializing it on the basis of len2, ie, the length of the
> > needle. Isn't that a bug? Was the same bug present in the tests you
> > made to determine the best table sizes?
>
> BTW, to the extent that you feel like testing a different idea,
> I would suggest:
>
> * don't bother initializing the skiptable when len1 < len2
>
> * otherwise, choose its size based on len1 - len2, not just len1 or
> len2. This is (one less than) the maximum number of search loop
> consultations of the skip table that can happen, so it seems like a
> plausible number, and better than either length alone.

I've made the discussed changes. Also updated the benchmark results.
http://www.unixbeast.com/~fat/8.3_test_v1.3.xls

On re-benchmarking to determine the best skip table size, the changes to the
logic didn't seem to affect the results enough to have to change the
thresholds. But I'm sure it will perform a little better in cases like when
both the needle and haystack are very long and close to being the same
length. Having a big skip table in this case is a little bit of a waste as
the search won't get to make much use of it.

Attachment	Content-Type	Size
boyer-moore-strpos_v1.3.patch	application/octet-stream	7.9 KB

In response to

Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker) at 2008-09-06 22:50:47 from Tom Lane

Responses

Re: [PATCHES] TODO item: Implement Boyer-Moore searching (First time hacker) at 2008-09-07 04:26:47 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2008-09-07 02:09:17	Re: Synchronous Log Shipping Replication
Previous Message	Tom Lane	2008-09-07 00:55:48	Re: Review Report: propose to include 3 new functions into intarray and intagg

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Alex Hunsaker	2008-09-07 02:23:05	Re: hash index improving v3
Previous Message	Tom Lane	2008-09-07 00:10:04	Re: [PgFoundry] Unsigned Data Types [1 of 2]