Diffing words - percentages

is there a way to do the following:

say i have two words:

WelcomeMattTom

and 

WelcomeMTom

How can i compare the two words to know how much alike, in percentages they are?

like, how similar is WelcomeMTom to WelcomeMattTom?

not clear yet?

say i introduced a third word, WelcomeMattTomm, how similar is WelcomeMattTomm to WelcomeMattTom?

im looking for a way to do this in bash/awk. something like this:

./script.sh <firstword>  <secondword>
98%

which would mean secondword is 98% similar to firstword.

os: linux

You want similarity algorithms

Here is a good article explaining one approach (it talks about java):
How to Strike a Match

Levenshtein distance may be the most likely candidate for you:
Levenshtein distance - Wikipedia, the free encyclopedia

Here is perl module wordnet::similarity
WordNet::Similarity - search.cpan.org

You have to download this module and part of the parent module, too. It gives examples. You will have to work out your percentage calculation using results from a module like this one. Or roll your own (article 1 above). I would recommend doing some reading (above) before messing with this. Similairity algorithms can do interesting and sometimes confusing things. IMO.

1 Like

Hi.

You might start here Levenshtein distance - Wikipedia, the free encyclopedia where there is an explanation of edit distance, some pseudocode, as well as a number of references. Found with google search for distance between 2 strings

Google is your friend.

Best wishes ... cheers, drl

( edit 1: similar to Jim's reply )

1 Like

oh wow. thanks guys!

i thought there'd be a quick fix for this. but guess i was wrong. lol

Here is a GNU awk code for calculating Levenshtein distance. I hope this will help.

2 Likes

If you just want a quick and dirty estimation:

d=$(diff <(echo "$1" |sed 's/./&\n/g') <(echo "$2" |sed 's/./&\n/g') |grep -c '^[<>]')
echo $((100-100*d/(${#1}+${#2})))%
1 Like

oh my!!!! this one does exactly what i wanted. i knew there had to be a much simpler way. thank you so much. thank you!!!

and thanks to everyone else that responded. i really really appreciate your help. thank you!