## Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

From: Alexander Korotkov Robert Haas Itagaki Takahiro , PostgreSQL Hackers Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) 2010-09-27 12:42:52 AANLkTi=bzNBFhCVc+D=L5BP+BjHKYqd9PATBA5QMQQHF@mail.gmail.com (view raw or whole thread) 2010-09-24 20:01:24 from Robert Haas  2010-09-27 12:42:52 from Alexander Korotkov   2010-09-28 02:52:40 from Robert Haas    2010-10-04 20:19:18 from Alexander Korotkov     2010-10-08 01:52:26 from Robert Haas      2010-10-08 16:51:40 from Alexander Korotkov       2010-10-13 05:49:27 from Robert Haas        2010-10-13 13:32:36 from Tom Lane         2010-10-13 14:18:01 from Alvaro Herrera          2010-10-13 14:51:59 from Tom Lane           2010-10-13 15:22:17 from Robert Haas            2010-10-13 15:42:28 from Tom Lane             2010-10-13 15:53:35 from Alexander Korotkov              2010-10-13 15:58:32 from Alexander Korotkov             2010-10-13 16:30:25 from Robert Haas          2010-10-13 16:23:22 from Andres Freund pgsql-hackers
```I'll try to illustrate different approaches in examples.

k  i  t  t  e  n
0  1  2  3  4  5  6
s  1  1  2  3  4  5  6
i  2  2  1  2  3  4  5
t  3  3  2  1  2  3  4
t  4  4  3  2  1  2  3
i  5  5  4  3  2  2  3
n  6  6  5  4  3  3  2
g  7  7  6  5  4  4  3

The approach mentioned in Wikipedia limits filling of the matrix by
diagonals, which are in k-distance from main diagonal (diagonal which
contain left top cell). All other cell is guaranteed to have value greater
than k. This approach can be extended to the case of costs different from 1,
in this case limit diagonals will be closer to main diagonal proportional to
the costs. Here is example of such limited matrix with k = 3.

k  i  t  t  e  n
0  1  2  3
s  1  1  2  3  4
i  2  2  1  2  3  4
t  3  3  2  1  2  3  4
t     4  3  2  1  2  3
i        4  3  2  2  3
n           4  3  3  2
g              4  4  3

The first idea of my approach is to use actual cell values to limit matrix
filling. For each row the range of columns where cell values are not greater
than k is stored. And in the next row only cells are caclucated, which use
values of cells from previous row in stored range. Here is example of this
approach with k = 3. There are slightly less filled cells but calculation
are more complicated than in previoud approach.

k  i  t  t  e  n
0  1  2  3
s  1  1  2  3
i  2  2  1  2  3
t  3  3  2  1  2  3
t        3  2  1  2  3
i           3  2  2  3
n              3  3  2
g                    3

The second idea is to make values in matrix possible greater. I analyze what
exactly is matrix in this case. It is sum of original matrix, which
represent distances between prefixes of strings, and matrix, which represent
cost of insertions or deletions for moving to diagonal, which containing
bottom right cell. There is an example of second matrix summand:

k  i  t  t  e  n
1  2  3  4  5  6  7
s  0  1  2  3  4  5  6
i  1  0  1  2  3  4  5
t  2  1  0  1  2  3  4
t  3  2  1  0  1  2  3
i  4  3  2  1  0  1  2
n  5  4  3  2  1  0  1
g  6  5  4  3  2  1  0

And an example of resulting matrix:

k  i  t  t  e  n
1  3  5  7  9  11 13
s  1  2  4  6  8  10 12
i  3  2  2  4  6  8  10
t  5  4  2  2  4  6  8
t  7  6  4  2  2  4  6
i  9  8  6  4  2  3  5
n  11 10 8  6  4  3  3
g  13 12 10 8  6  5  3

The resulting matrix saves important property of original matrix, that cell
value always greater or equal than values, which are used for it's
calculation. That's why we can use idea about matrix filling limiting for
this matrix, but values in this matrix are greater and it allow to avoid
filling bigger part of matrix. There is an example of matrix filling
limiting for this matrix:

k  i  t  t  e  n
1  3
s  1  2
i     2  2
t        2  2
t           2  2
i              2  3
n                 3  3
g                    3

----
With best regards,
Alexander Korotkov.
```

### pgsql-hackers by date

 Next: From: Devrim GÜNDÜZ Date: 2010-09-27 12:51:40 Subject: Re: pg_filedump binary for CentOS Previous: From: Dave Page Date: 2010-09-27 12:34:40 Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session