Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
От | Alexander Korotkov |
---|---|
Тема | Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) |
Дата | |
Msg-id | AANLkTi=bzNBFhCVc+D=L5BP+BjHKYqd9PATBA5QMQQHF@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: levenshtein_less_equal (was: multibyte charater set in
levenshtein function)
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
I'll try to illustrate different approaches in examples. <br /><br /><span style="font-family: courier new,monospace;"> k i t t e n<br /> 0 1 2 3 4 5 6<br />s 1 1 2 3 4 5 6<br />i 2 2 1 2 3 4 5<br/> t 3 3 2 1 2 3 4<br />t 4 4 3 2 1 2 3<br />i 5 5 4 3 2 2 3<br />n 6 6 5 4 3 3 2<br />g 7 7 6 5 4 4 3<br /><br /></span>The approach mentioned in Wikipedia limits filling of the matrix by diagonals,which are in k-distance from main diagonal (diagonal which contain left top cell). All other cell is guaranteedto have value greater than k. This approach can be extended to the case of costs different from 1, in this caselimit diagonals will be closer to main diagonal proportional to the costs. Here is example of such limited matrix withk = 3.<br /><span style="font-family: courier new,monospace;"><br /> k i t t e n<br /> 0 1 2 3 <br />s 1 1 2 3 4 <br />i 2 2 1 2 3 4 <br />t 3 3 2 1 2 3 4<br />t 4 3 2 1 2 3<br/>i 4 3 2 2 3<br /> n 4 3 3 2<br />g 4 4 3<br /><br /></span>The first idea ofmy approach is to use actual cell values to limit matrix filling. For each row the range of columns where cell values arenot greater than k is stored. And in the next row only cells are caclucated, which use values of cells from previous rowin stored range. Here is example of this approach with k = 3. There are slightly less filled cells but calculation aremore complicated than in previoud approach.<br /><span style="font-family: courier new,monospace;"><br /> k i t t e n<br /> 0 1 2 3 <br />s 1 1 2 3 <br />i 2 2 1 2 3 <br />t 3 3 2 1 2 3 <br />t 3 2 1 2 3<br />i 3 2 2 3<br /> n 3 3 2<br />g 3<br /><br/></span>The second idea is to make values in matrix possible greater. I analyze what exactly is matrix in this case.It is sum of original matrix, which represent distances between prefixes of strings, and matrix, which represent costof insertions or deletions for moving to diagonal, which containing bottom right cell. There is an example of secondmatrix summand:<br /><span style="font-family: courier new,monospace;"><br /></span><span style="font-family: couriernew,monospace;"> k i t t e n<br /> 1 2 3 4 5 6 7<br /> s 0 1 2 3 4 5 6<br /> i 1 0 1 2 3 4 5<br /> t 2 1 0 1 2 3 4<br /> t 3 2 1 0 1 2 3<br /> i 4 3 2 1 0 1 2<br /> n 5 4 3 2 1 0 1<br /> g 6 5 4 3 2 1 0<br /></span><br />And an example of resulting matrix:<br /><br /><span style="font-family:courier new,monospace;"> k i t t e n<br /> 1 3 5 7 9 11 13<br />s 1 2 4 6 8 1012<br />i 3 2 2 4 6 8 10<br /> t 5 4 2 2 4 6 8<br />t 7 6 4 2 2 4 6<br />i 9 8 6 4 2 3 5<br/>n 11 10 8 6 4 3 3<br />g 13 12 10 8 6 5 3<br /></span><br />The resulting matrix saves important propertyof original matrix, that cell value always greater or equal than values, which are used for it's calculation. That'swhy we can use idea about matrix filling limiting for this matrix, but values in this matrix are greater and it allowto avoid filling bigger part of matrix. There is an example of matrix filling limiting for this matrix:<br /><span style="font-family:courier new,monospace;"><br /> k i t t e n<br /> 1 3 <br />s 1 2 <br />i 2 2 <br />t 2 2 <br />t 2 2 <br />i 2 3 <br /> n 3 3<br /> g 3<br /></span><br />----<br />With best regards,<br/>Alexander Korotkov.<br />
В списке pgsql-hackers по дате отправления: