Class StringSimilarity

java.lang.Object
org.genesys.taxonomy.checker.StringSimilarity

public class StringSimilarity extends Object
Code from
  • https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java
  • Constructor Details

    • StringSimilarity

      public StringSimilarity()
  • Method Details

    • diceCoefficientOptimized

      public static double diceCoefficientOptimized(String s, String t)
      Retrieved from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java Here's an optimized version of the dice coefficient calculation. It takes advantage of the fact that a bigram of 2 chars can be stored in 1 int, and applies a matching algorithm of O(n*log(n)) instead of O(n*n).

      Note that, at the time of writing, this implementation differs from the other implementations on this page. Where the other algorithms incorrectly store the generated bigrams in a set (discarding duplicates), this implementation actually treats multiple occurrences of a bigram as unique. The correctness of this behavior is most easily seen when getting the similarity between "GG" and "GGGGGGGG", which should obviously not be 1.

      Parameters:
      s - The first string
      t - The second String
      Returns:
      The dice coefficient between the two input strings. Returns 0 if one or both of the strings are null. Also returns 0 if one or both of the strings contain less than 2 characters and are not equal.
    • getLevenshteinCoefficient

      public static double getLevenshteinCoefficient(String a, String b)
      Get Levenshtein coefficient of two strings
      Parameters:
      a - first string
      b - second string
      Returns:
      Coefficient value betwen 0 and 1