PerfectHash

A perfect hash function tool. It needs about 1.4 bits per key, and the resulting hash table is about 79% full. The minimal perfect hash function needs about 2.3 bits per key.

Generating the hash function takes about 1 second per million keys for both perfect hash and minimal perfect hash.

The algorithm is recursive: sets that contain no or only one entry are not processed as no conflicts are possible. Sets that contain between 2 and 16 entries, up to 16 hash functions are tested to check if they can store the data without conflict. If no function was found, the same is tested on a larger bucket (except for the minimal perfect hash). If no hash function was found, and for larger buckets, the bucket is split into a number of smaller buckets (up to 32).

At the end of the generation process, the data is compressed using a general purpose compression tool (Deflate / Huffman coding). The uncompressed data is around 1.52 bits per key (perfect hash) and 3.72 (minimal perfect hash).

Please also note the MinimalPerfectHash class, which uses less space per key.

Methods
static byte[] generate(Set list, boolean minimal)
Generate the perfect hash function data from the given set of integers.
static byte[] generate(Set list, boolean minimal)
Generate the perfect hash function data from the given set of integers.
Parameters:
list - the set
minimal - whether the perfect hash function needs to be minimal
Returns:
the data
PerfectHash(byte[] data)
Create a hash object to convert keys to hashes.
PerfectHash(byte[] data)
Create a hash object to convert keys to hashes.
Parameters:
data - the data returned by the generate method
int get(int x)
Calculate the hash from the key.
int get(int x)
Calculate the hash from the key.
Parameters:
x - the key
Returns:
the hash