Class Simhash

java.lang.Object
org.miaixz.bus.core.codec.hash.Simhash
All Implemented Interfaces:
Encoder<Collection<? extends CharSequence>,Number>, Hash64<Collection<? extends CharSequence>>

public class Simhash extends Object implements Hash64<Collection<? extends CharSequence>>
Simhash是一种局部敏感hash,用于海量文本去重。

算法实现来自:https://github.com/xlturing/Simhash4J

局部敏感hash定义:假定两个字符串具有一定的相似性,在hash之后,仍然能保持这种相似性,就称之为局部敏感hash。

Since:
Java 17+
Author:
Kimi Liu