Package org.genesys.taxonomy.checker
Class StringSimilarity
java.lang.Object
org.genesys.taxonomy.checker.StringSimilarity
Code from
- https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic doubleRetrieved from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java Here's an optimized version of the dice coefficient calculation.static doubleGet Levenshtein coefficient of two strings
-
Constructor Details
-
StringSimilarity
public StringSimilarity()
-
-
Method Details
-
diceCoefficientOptimized
Retrieved from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java Here's an optimized version of the dice coefficient calculation. It takes advantage of the fact that a bigram of 2 chars can be stored in 1 int, and applies a matching algorithm of O(n*log(n)) instead of O(n*n).Note that, at the time of writing, this implementation differs from the other implementations on this page. Where the other algorithms incorrectly store the generated bigrams in a set (discarding duplicates), this implementation actually treats multiple occurrences of a bigram as unique. The correctness of this behavior is most easily seen when getting the similarity between "GG" and "GGGGGGGG", which should obviously not be 1.
- Parameters:
s- The first stringt- The second String- Returns:
- The dice coefficient between the two input strings. Returns 0 if one or both of the strings are
null. Also returns 0 if one or both of the strings contain less than 2 characters and are not equal.
-
getLevenshteinCoefficient
Get Levenshtein coefficient of two strings- Parameters:
a- first stringb- second string- Returns:
- Coefficient value betwen 0 and 1
-