Class MostFrequentKChars


  • public class MostFrequentKChars
    extends Object
    Based on pseudocode at https://en.wikipedia.org/wiki/Most_frequent_k_characters and http://rosettacode.org/wiki/Most_frequent_k_chars_distance Does not handle digits [0-9] for obvious reasons.
    • Constructor Detail

      • MostFrequentKChars

        public MostFrequentKChars()
    • Method Detail

      • getMostFrequentKHash

        public static String getMostFrequentKHash​(String string,
                                                  int k)
        Get the hash for an input string with at most K most frequent characters.
                String function MostFreqKHashing (String inputString, int K)
                        def string outputString
                        for each distinct character
                            count occurrence of each character
                        for i := 0 to K
                            char c = next most freq ith character  (if two chars have same frequency then get the first occurrence in inputString)
                            int count = number of occurrence of the character
                            append to outputString, c and count
                        end for
                        return outputString
         
        Parameters:
        string - the string
        k - the k
        Returns:
        the most frequent k hash
      • getMostFreqKSimilarity

        public static int getMostFreqKSimilarity​(String hash1,
                                                 String hash2)
        Calculate the similarity of the two hashes.
        Parameters:
        hash1 - the hash1
        hash2 - the hash2
        Returns:
        the most freq k similarity
      • getMostFreqKSimilarity

        public static int getMostFreqKSimilarity​(int[] hash1,
                                                 int[] hash2)
        Calculate the similarity of the two hashes.
                        int function MostFreqKSimilarity (String inputStr1, String inputStr2, int limit)
                            def int similarity
                            for each c = next character from inputStr1
                                lookup c in inputStr2
                                if c is null
                                     continue
                                // similarity += frequency of c in inputStr1
                                similarity += frequency of c in inputStr1 + frequency of c in inputStr2
                            // return limit - similarity
                            return similarity
         
        Parameters:
        hash1 - the hash1
        hash2 - the hash2
        Returns:
        the most freq k similarity
      • mostFreqKSDF

        public static int mostFreqKSDF​(String inputStr1,
                                       String inputStr2,
                                       int K,
                                       int maxDistance)
        Wrapper function.
                        int function MostFreqKSDF (string inputStr1, string inputStr2, int K, int maxDistance)
                            return maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))
         
        Parameters:
        inputStr1 - the input str1
        inputStr2 - the input str2
        K - the k
        maxDistance - the max distance
        Returns:
        the int
      • mostFreqKSDF

        public static double mostFreqKSDF​(String inputStr1,
                                          String inputStr2,
                                          int K)
        Most freq ksdf.
        Parameters:
        inputStr1 - the input str1
        inputStr2 - the input str2
        K - the k
        Returns:
        the double
      • toHashString

        public static String toHashString​(int[] h1)
        Encode a hash array to String.
        Parameters:
        h1 - hash array as generated
        Returns:
        String representation of the hash array (e.g. "i3b2")