Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Date: 2010-10-13 05:49:27
Message-ID: AANLkTim-eBc+FtOYZ+zRfi9t_m6gHsxWYUR6fX=qvu3W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 8, 2010 at 12:51 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> Sorry, I'm pretty unconversant in git. Now, it should be ok.

I spent some time hacking on this. It doesn't appear to be too easy
to get levenshtein_less_equal() working without slowing down plain old
levenshtein() by about 6%. The slowdown was similar on the 0.2
version of the patch, the 0.3.1 version of the patch, and another
version I put together that's a bit different from either. The test
query was:

SELECT * FROM words WHERE levenshtein(a, 'extensize') <= 2;

In the version I was experimenting with, I had a large chunk of code
that looked like this:

if (max_d >= 0)
{
/* Do stuff */
}

Removing that block of code buys back most of the performance loss
that occurs in the case where max_d < 0. I have to conclude that
sticking more stuff into the main loop defeats some optimization
opportunities that would otherwise be available, even in the case
where the branch isn't taken.

So I'm tempted to employ the previously-discussed sledgehammer of
compiling levenshtein_internal twice, with and without support for
max_d. A patch taking this approach is attached. I'm not totally
sold on this approach, if someone has a constructive suggestion.
Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
levenshtein_less_equal-0.4.patch application/octet-stream 23.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-13 06:08:56 Re: WIP: extensible enums
Previous Message Itagaki Takahiro 2010-10-13 05:44:08 Re: Bug / shortcoming in has_*_privilege