Skip site navigation (1)
Skip section navigation (2)
## Re: levenshtein_less_equal (was: multibyte charater set in
levenshtein function)

### In response to

### Responses

### pgsql-hackers by date

I'll try to illustrate different approaches in examples. k i t t e n 0 1 2 3 4 5 6 s 1 1 2 3 4 5 6 i 2 2 1 2 3 4 5 t 3 3 2 1 2 3 4 t 4 4 3 2 1 2 3 i 5 5 4 3 2 2 3 n 6 6 5 4 3 3 2 g 7 7 6 5 4 4 3 The approach mentioned in Wikipedia limits filling of the matrix by diagonals, which are in k-distance from main diagonal (diagonal which contain left top cell). All other cell is guaranteed to have value greater than k. This approach can be extended to the case of costs different from 1, in this case limit diagonals will be closer to main diagonal proportional to the costs. Here is example of such limited matrix with k = 3. k i t t e n 0 1 2 3 s 1 1 2 3 4 i 2 2 1 2 3 4 t 3 3 2 1 2 3 4 t 4 3 2 1 2 3 i 4 3 2 2 3 n 4 3 3 2 g 4 4 3 The first idea of my approach is to use actual cell values to limit matrix filling. For each row the range of columns where cell values are not greater than k is stored. And in the next row only cells are caclucated, which use values of cells from previous row in stored range. Here is example of this approach with k = 3. There are slightly less filled cells but calculation are more complicated than in previoud approach. k i t t e n 0 1 2 3 s 1 1 2 3 i 2 2 1 2 3 t 3 3 2 1 2 3 t 3 2 1 2 3 i 3 2 2 3 n 3 3 2 g 3 The second idea is to make values in matrix possible greater. I analyze what exactly is matrix in this case. It is sum of original matrix, which represent distances between prefixes of strings, and matrix, which represent cost of insertions or deletions for moving to diagonal, which containing bottom right cell. There is an example of second matrix summand: k i t t e n 1 2 3 4 5 6 7 s 0 1 2 3 4 5 6 i 1 0 1 2 3 4 5 t 2 1 0 1 2 3 4 t 3 2 1 0 1 2 3 i 4 3 2 1 0 1 2 n 5 4 3 2 1 0 1 g 6 5 4 3 2 1 0 And an example of resulting matrix: k i t t e n 1 3 5 7 9 11 13 s 1 2 4 6 8 10 12 i 3 2 2 4 6 8 10 t 5 4 2 2 4 6 8 t 7 6 4 2 2 4 6 i 9 8 6 4 2 3 5 n 11 10 8 6 4 3 3 g 13 12 10 8 6 5 3 The resulting matrix saves important property of original matrix, that cell value always greater or equal than values, which are used for it's calculation. That's why we can use idea about matrix filling limiting for this matrix, but values in this matrix are greater and it allow to avoid filling bigger part of matrix. There is an example of matrix filling limiting for this matrix: k i t t e n 1 3 s 1 2 i 2 2 t 2 2 t 2 2 i 2 3 n 3 3 g 3 ---- With best regards, Alexander Korotkov.

- Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) at 2010-09-24 20:01:24 from Robert Haas

- Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) at 2010-09-28 02:52:40 from Robert Haas

Next: From:Devrim GÜNDÜZDate:2010-09-27 12:51:40Subject: Re: pg_filedump binary for CentOSPrevious: From: Dave PageDate: 2010-09-27 12:34:40Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session