Class StringSimilarity


  • public class StringSimilarity
    extends Object
    Code from
    • https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java
    • Constructor Detail

      • StringSimilarity

        public StringSimilarity()
    • Method Detail

      • diceCoefficientOptimized

        public static double diceCoefficientOptimized​(String s,
                                                      String t)
        Retrieved from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Dice's_coefficient#Java Here's an optimized version of the dice coefficient calculation. It takes advantage of the fact that a bigram of 2 chars can be stored in 1 int, and applies a matching algorithm of O(n*log(n)) instead of O(n*n).

        Note that, at the time of writing, this implementation differs from the other implementations on this page. Where the other algorithms incorrectly store the generated bigrams in a set (discarding duplicates), this implementation actually treats multiple occurrences of a bigram as unique. The correctness of this behavior is most easily seen when getting the similarity between "GG" and "GGGGGGGG", which should obviously not be 1.

        Parameters:
        s - The first string
        t - The second String
        Returns:
        The dice coefficient between the two input strings. Returns 0 if one or both of the strings are null. Also returns 0 if one or both of the strings contain less than 2 characters and are not equal.
      • getLevenshteinCoefficient

        public static double getLevenshteinCoefficient​(String a,
                                                       String b)