This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

[script/library] Levenshtein's distance v1.0

Support & discussion of released scripts, and announcements of new releases.
Post Reply
User avatar
MenzAgitat
Op
Posts: 118
Joined: Tue Jul 04, 2006 12:35 pm
Location: France
Contact:

[script/library] Levenshtein's distance v1.0

Post by MenzAgitat »

 
 
This script provides package Levenshtein :

Code: Select all

Package provide Levenshtein 1.0

Description:

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance). The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. A generalization of the Levenshtein distance (Damerau–Levenshtein distance) allows the transposition of two characters as an operation.

( see http://en.wikipedia.org/wiki/Levenshtein_distance )


Interest:
  • Allows an orthographical corrector to suggest alternate words with a low Levenshtein distance.
  • Allows pseudo-AI to have an orthographical tolerance.
  • ...

Syntax:

levenshtein::distance <string 1> <string 2>

you can also use a public command if you want to test things :

!test_levenshtein <string 1> <string 2>


Examples (in french, sorry):

Code: Select all

levenshtein::distance "bonjour" "bougeoir"
-> 4
you must manipulate 4 characters to transform the word "bonjour" into the word "bougeoir" :
  • BONJOUR
  • BOUJOUR -> we replace N by U
  • BOUGOUR -> we replace J by G
  • BOUGEOUR -> we insert E
  • BOUGEOIR -> we replace U by I

Code: Select all

levenshtein::distance "antiquaire" "antikaire"
-> 2
We can conclude from this 10 letters long example that it is very similar to the second word, with a distance of only 2.
You must keep in mind that a distance of 2 between two words of 10 letters means they are very similar, while a distance of 2 between two words of 3 letters means they are very different as you can see in the following example :

Code: Select all

levenshtein::distance "pin" "pas"
-> 2
As you can see, a distance of 2 doesnt mean much difference in a 10 letters word but represents important modifications in a 3 letters one.
In order to preserve relevance of results, you'll take care to always link the tolerance to the length, proportionately.

Code: Select all

levenshtein::distance "antiquaire" "dimanche"
-> 8
In this last example, we can see that the distance between the first word and the second is 8. They are very different words.


Download:

Levenshtein's distance v1.0
    Post Reply