Package org.genesys.taxonomy.checker
Class MostFrequentKChars
- java.lang.Object
-
- org.genesys.taxonomy.checker.MostFrequentKChars
-
public class MostFrequentKChars extends Object
Based on pseudocode at https://en.wikipedia.org/wiki/Most_frequent_k_characters and http://rosettacode.org/wiki/Most_frequent_k_chars_distance Does not handle digits [0-9] for obvious reasons.
-
-
Constructor Summary
Constructors Constructor Description MostFrequentKChars()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static intgetMostFreqKSimilarity(int[] hash1, int[] hash2)Calculate the similarity of the two hashes.static intgetMostFreqKSimilarity(String hash1, String hash2)Calculate the similarity of the two hashes.static StringgetMostFrequentKHash(String string, int k)Get the hash for an input string with at most K most frequent characters.static doublemostFreqKSDF(String inputStr1, String inputStr2, int K)Most freq ksdf.static intmostFreqKSDF(String inputStr1, String inputStr2, int K, int maxDistance)Wrapper function.static StringtoHashString(int[] h1)Encode a hash array to String.
-
-
-
Method Detail
-
getMostFrequentKHash
public static String getMostFrequentKHash(String string, int k)
Get the hash for an input string with at most K most frequent characters.String function MostFreqKHashing (String inputString, int K) def string outputString for each distinct character count occurrence of each character for i := 0 to K char c = next most freq ith character (if two chars have same frequency then get the first occurrence in inputString) int count = number of occurrence of the character append to outputString, c and count end for return outputString- Parameters:
string- the stringk- the k- Returns:
- the most frequent k hash
-
getMostFreqKSimilarity
public static int getMostFreqKSimilarity(String hash1, String hash2)
Calculate the similarity of the two hashes.- Parameters:
hash1- the hash1hash2- the hash2- Returns:
- the most freq k similarity
-
getMostFreqKSimilarity
public static int getMostFreqKSimilarity(int[] hash1, int[] hash2)Calculate the similarity of the two hashes.int function MostFreqKSimilarity (String inputStr1, String inputStr2, int limit) def int similarity for each c = next character from inputStr1 lookup c in inputStr2 if c is null continue // similarity += frequency of c in inputStr1 similarity += frequency of c in inputStr1 + frequency of c in inputStr2 // return limit - similarity return similarity- Parameters:
hash1- the hash1hash2- the hash2- Returns:
- the most freq k similarity
-
mostFreqKSDF
public static int mostFreqKSDF(String inputStr1, String inputStr2, int K, int maxDistance)
Wrapper function.int function MostFreqKSDF (string inputStr1, string inputStr2, int K, int maxDistance) return maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))- Parameters:
inputStr1- the input str1inputStr2- the input str2K- the kmaxDistance- the max distance- Returns:
- the int
-
mostFreqKSDF
public static double mostFreqKSDF(String inputStr1, String inputStr2, int K)
Most freq ksdf.- Parameters:
inputStr1- the input str1inputStr2- the input str2K- the k- Returns:
- the double
-
toHashString
public static String toHashString(int[] h1)
Encode a hash array to String.- Parameters:
h1- hash array as generated- Returns:
- String representation of the hash array (e.g. "i3b2")
-
-