This script provides package Levenshtein :
Code: Select all
Package provide Levenshtein 1.0
Description:
In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance). The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. A generalization of the Levenshtein distance (Damerau–Levenshtein distance) allows the transposition of two characters as an operation.
( see http://en.wikipedia.org/wiki/Levenshtein_distance )
Interest:
- Allows an orthographical corrector to suggest alternate words with a low Levenshtein distance.
- Allows pseudo-AI to have an orthographical tolerance.
- ...
Syntax:
levenshtein::distance <string 1> <string 2>
you can also use a public command if you want to test things :
!test_levenshtein <string 1> <string 2>
Examples (in french, sorry):
Code: Select all
levenshtein::distance "bonjour" "bougeoir"
-> 4
- BONJOUR
- BOUJOUR -> we replace N by U
- BOUGOUR -> we replace J by G
- BOUGEOUR -> we insert E
- BOUGEOIR -> we replace U by I
Code: Select all
levenshtein::distance "antiquaire" "antikaire"
-> 2
You must keep in mind that a distance of 2 between two words of 10 letters means they are very similar, while a distance of 2 between two words of 3 letters means they are very different as you can see in the following example :
Code: Select all
levenshtein::distance "pin" "pas"
-> 2
In order to preserve relevance of results, you'll take care to always link the tolerance to the length, proportionately.
Code: Select all
levenshtein::distance "antiquaire" "dimanche"
-> 8
Download:
Levenshtein's distance v1.0