Skip site navigation (1) Skip section navigation (2)

Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Date: 2010-09-24 20:01:24
Message-ID: AANLkTimfB+ebz_FC6Uvc=aSkkiGjMvDKzX2zdbS2CUUM@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sat, Aug 28, 2010 at 8:34 AM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> Here is the patch which adds levenshtein_less_equal function. I'm going to
> add it to current commitfest.

There are some minor stylistic issues with this patch - e.g. lines
ending in whitespace, cuddled elses - but those don't look too
terribly difficult to fix.  I'm more concerned about the fact that I
don't really understand the algorithm you're using.  Actually, I
didn't really understand the original algorithm either until I went
and read up on it, and I just adjusted the comments to make it a bit
more clear what it's doing.  That caused some minor merge conflicts
with your patch, so I'm attaching a rebased version that applies
cleanly over my changes.

Can you explain a bit more what algorithm this is using?  It seems
like in the max_d >= 0 case the cells of the notional array have a
meaning which is completely different from what they mean in
otherwise, and it's not clear to me from reading the comments what
that meaning is.  I took a look on that font of all human knowledge,
Wikipedia:

http://en.wikipedia.org/wiki/Levenshtein_distance

Their suggestion for handling this case is:

"If we are only interested in the distance if it is smaller than a
threshold k, then it suffices to compute a diagonal stripe of width
2k+1 in the matrix. In this way, the algorithm can be run in O(kl)
time, where l is the length of the shortest string."

It seems like that may be similar to what you're doing here but I
don't think that's exactly it.  I don't think that exact thing would
work in our case anyhow because we've got configurable costs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment: levenshtein_less_equal-0.2.patch
Description: application/octet-stream (15.0 KB)

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2010-09-24 20:30:52
Subject: Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running
Previous:From: Andrew DunstanDate: 2010-09-24 19:54:40
Subject: Re: snapshot generation broken

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group