Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

From: "David Rowley" <dgrowley(at)gmail(dot)com>
To: "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>,<pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Date: 2008-09-02 23:19:26
Message-ID: CC00B041B12F40749EE43D49D99A705D@amd64 (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
I wrote: 
> I look forward to receiving feedback on this.

Peter Eisentraut wrote:
> Please send a patch in diff -c format.  And don't put a single patch 
> file in a tar file.

Thanks for the pointer. I'm quite new to this.


I've done some more revisions to the patch. This has mostly just involved
tuning the skip table size based on the size of the search. This has
basically involved lots of benchmarks with different strings and calculating
the best size of table to use. The reason for this is to maintain fast
searches for smaller strings. The overhead of initialising a 256 element
array would probably out weigh the cost of the search if this were not done.
The size of the skip table increases with longer strings, or rather the size
that is utilised.

Performance:
For smaller searches performance of the patch and existing version are very
similar. The patched version starts to out perform the existing version when
the needle and haystack become larger.

The patch wins hands down with searches that leads the existing function in
to dead ends, for example:

SELECT STRPOS('A AA AAA AAAA AAAAA AAAAAA AAAAAAA','AAAAAAA');

When searching for very small strings, say just a single character in a
sentence the existing function marginally beats the patched version.

Outside of Postgres I've done benchmarks where I've searched for every
combination of the search string in the search string. Like:

test     | t      |
test     | te     |
test     | tes    |
test     | test   |
test     | e      |
test     | es     |
test     | est    |
test     | s      |
test     | st     |
test     | t      |


I felt this was fair for both versions. The patched version beat the
unpatched version. The test I carried out was a string of 934 characters.

I can upload more benchmarks if required.

I'm quite happy with the patch now so I'm going to submit it (in diff -c
format this time :)

David.




Attachment: boyer-moore_strpos_v1.1.patch
Description: application/octet-stream (7.1 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Gregory StarkDate: 2008-09-02 23:25:06
Subject: Re: Question regarding the database page layout.
Previous:From: Marko KreenDate: 2008-09-02 22:44:06
Subject: Re: [PATCH] Cleanup of GUC units code

pgsql-patches by date

Next:From: Alvaro HerreraDate: 2008-09-03 00:19:20
Subject: Re: libpq object hooks (libpq events)
Previous:From: Tom LaneDate: 2008-09-02 17:18:37
Subject: Re: WIP Join Removal

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group