Re: TODO item: Implement Boyer-Moore searching in LIKE queries

From: Karan Sikka <karanssikka(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Date: 2016-08-02 18:45:10
Message-ID: CALkFZpcdRs90SxHfLLbV1ONatUnwYpBgXGt_yW008bKYcv62RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yeah, you make a fair point.

I was hoping more people would chime in with strong opinions to make the
decision easier, but perhaps the lack of noise indicates this is a low-pri
feature.

I will ask around in my coworker circles to see what they think,
could you do the same?

Just for the record, I'm leaning to the side of "not worth it". My thoughts
are,
if you are at a scale where you care about this 20% speedup, you would want
to go all the way to an indexed structure, because the gains you would
realize
would exceed 20%, and 20% may not be a sufficient speedup anyway.

On Tue, Aug 2, 2016 at 1:56 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Robert Haas wrote:
> > On Mon, Aug 1, 2016 at 1:19 PM, Karan Sikka <karanssikka(at)gmail(dot)com>
> wrote:
> > > Following the patch to implement strpos with Boyer-Moore-Horspool,
> > > it was proposed we bring BMH to LIKE as well.
> > >
> > > Original thread:
> > >
> https://www.postgresql.org/message-id/flat/27645(dot)1220635769%40sss(dot)pgh(dot)pa(dot)us#27645(dot)1220635769(at)sss(dot)pgh(dot)pa(dot)us
> > >
> > > I'm a first time hacker and I found this task on the TODO list. It
> seemed
> > > interesting and isolated enough. But after looking at the code in
> > > like_match.c, it's clearly a non-trivial task.
> > >
> > > How desirable is this feature? To begin answering that question,
> > > I did some initial benchmarking with an English text corpus to find
> how much
> > > faster BMH (Boyer-Moore-Horspool) would be compared to LIKE, and the
> results
> > > were as follows: BMH can be up to 20% faster than LIKE, but for short
> > > substrings, it's roughly comparable or slower.
> > >
> > > Here are the results visualized:
> > >
> > > http://ctrl-c.club/~ksikka/pg/like-strpos-short-1469975400.png
> > > http://ctrl-c.club/~ksikka/pg/like-strpos-long-1469975400.png
> >
> > Based on these results, this looks to me like a pretty unexciting
> > thing upon which to spend time.
>
> Uh, a 20% different is "unexciting" for you? I think it's interesting.
> Now, really, users shouldn't be running LIKE on constant strings all the
> time but rather use some sort of indexed search, but once in a while
> there is a need to run some custom query and you need to LIKE-scan a
> large portion of a table. For those cases an algorithm that performs
> 20% better is surely welcome.
>
> I wouldn't be so quick to dismiss this.
>
> Of course, it needs to work in all cases, or failing that, be able to
> fall back to the original code if it cannot support some corner case.
>
> --
> Álvaro Herrera http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-08-02 18:55:06 Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Previous Message Tom Lane 2016-08-02 18:38:19 Re: PostmasterContext survives into parallel workers!?