public class Similarity extends Object
| Constructor and Description |
|---|
Similarity()
构造
|
Similarity(int fracCount,
int hammingThresh)
构造
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
equals(Collection<? extends CharSequence> segList)
判断文本是否与已存储的数据重复
|
long |
hash(Collection<? extends CharSequence> segList)
指定文本计算simhash值
|
static double |
similar(String strA,
String strB)
计算相似度,两个都是空串相似度为1,被认为是相同的串
|
static String |
similar(String strA,
String strB,
int scale)
计算相似度百分比
|
void |
store(Long simhash)
按照索引进行存储
|
public Similarity()
public Similarity(int fracCount,
int hammingThresh)
fracCount - 存储段数hammingThresh - 汉明距离的衡量标准public static double similar(String strA, String strB)
strA - 字符串1strB - 字符串2public static String similar(String strA, String strB, int scale)
strA - 字符串1strB - 字符串2scale - 保留小数public long hash(Collection<? extends CharSequence> segList)
segList - 分词的词列表public boolean equals(Collection<? extends CharSequence> segList)
segList - 文本分词后的结果public void store(Long simhash)
simhash - Simhash值Copyright © 2020. All rights reserved.