public final class KendallTauSequenceDistance extends Object
Kendall Tau Sequence Distance is the minimum number of adjacent swaps necessary to transform one sequence into the other. It is an edit distance with adjacent swap as the edit operation. It is applicable only if both sequences are the same length and contain the same set of elements.
As a distance metric, Kendall Tau Distance originated specifically to measure distance between permutations (i.e., sequence of unique elements). But, the Kendall Tau Sequence Distance that is implemented here is an extension of Kendall Tau Distance to general sequences (i.e., strings that can contain duplicate elements).
Consider this example. Let s1 = "abcdaabb" and s2 = "dcbababa". The shortest sequence of adjacent swaps to edit s2 into s1 is the following sequence of 9 swaps: "cdbababa", "cbdababa", "bcdababa", "bcadbaba", "bacdbaba", "abcdbaba", "abcdabba", "abcdabab", "abcdaabb".
In this Java class, we provide implementations of two algorithms. Both algorithms are relevant for computing the distance between arrays of primitive values as well as distance between String objects. For computing the Kendall Tau Sequence Distance of two arrays of any primitive type (e.g., arrays of ints, longs, shorts, bytes, chars, floats, doubles, or booleans), as well as for computing the distance between two String objects, the runtime of both algorithms is O(n lg n), where n is the length of the array or String.
If you are computing the distance between two arrays of Objects, the two algorithms have the
following restrictions. The default algorithm requires the objects to be of a class that
overrides the hashCode and equals methods of the Object class. The alternate
algorithm requires Objects to be of a class that implements the Comparable
interface, and overrides the equals method of the Object class. The runtime
for computing distance between arrays of objects via the default algorithm is O(h(m) n + n lg n),
where n is the array length, m is the size of the objects in the array, and h(m) is the
runtime to compute a hash of an object of size m. The runtime for the alternate algorithm for
arrays of objects is O(c(m) n lg n), where n and m are as before, and c(m) is the runtime of
the compareTo method for objects of size m. The default algorithm is the preferred algorithm
in most cases. The alternate algorithm may run faster if the cost to compare objects, c(m),
is significantly less than the cost to hash objects, h(m).
Runtime: O(n lg n) for String objects and sequences of primitives, where n is the length of the sequence.
If your sequences are guaranteed not to have duplicates,
and to contain the same set of elements, then consider instead using the
KendallTauDistance class, which
assumes permutations of the integers from 0 to N-1.
This distance metric, and both algorithms, is first described in the paper:
V.A. Cicirello, "Kendall Tau
Sequence Distance: Extending Kendall Tau from Ranks to Sequences,"
arXiv preprint arXiv:1905.02752 [cs.DM], May 2019.
| Constructor and Description |
|---|
KendallTauSequenceDistance()
The KendallTauDistance class provides two algorithms.
|
KendallTauSequenceDistance(boolean useAlternateAlg)
The KendallTauDistance class provides two algorithms.
|
| Modifier and Type | Method and Description |
|---|---|
int |
distance(boolean[] s1,
boolean[] s2)
Measures the distance between two arrays.
|
int |
distance(byte[] s1,
byte[] s2)
Measures the distance between two arrays.
|
int |
distance(char[] s1,
char[] s2)
Measures the distance between two arrays.
|
int |
distance(double[] s1,
double[] s2)
Measures the distance between two arrays.
|
int |
distance(float[] s1,
float[] s2)
Measures the distance between two arrays.
|
int |
distance(int[] s1,
int[] s2)
Measures the distance between two arrays.
|
<T> int |
distance(List<T> s1,
List<T> s2)
Measures the distance between two lists of objects.
|
int |
distance(long[] s1,
long[] s2)
Measures the distance between two arrays.
|
int |
distance(Object[] s1,
Object[] s2)
Measures the distance between two arrays of objects.
|
int |
distance(short[] s1,
short[] s2)
Measures the distance between two arrays.
|
int |
distance(String s1,
String s2)
Measures the distance between two Strings.
|
double |
distancef(boolean[] s1,
boolean[] s2)
Measures the distance between two arrays.
|
double |
distancef(byte[] s1,
byte[] s2)
Measures the distance between two arrays.
|
double |
distancef(char[] s1,
char[] s2)
Measures the distance between two arrays.
|
double |
distancef(double[] s1,
double[] s2)
Measures the distance between two arrays.
|
double |
distancef(float[] s1,
float[] s2)
Measures the distance between two arrays.
|
double |
distancef(int[] s1,
int[] s2)
Measures the distance between two arrays.
|
<T> double |
distancef(List<T> s1,
List<T> s2)
Measures the distance between two lists of objects.
|
double |
distancef(long[] s1,
long[] s2)
Measures the distance between two arrays.
|
double |
distancef(Object[] s1,
Object[] s2)
Measures the distance between two arrays of objects.
|
double |
distancef(short[] s1,
short[] s2)
Measures the distance between two arrays.
|
double |
distancef(String s1,
String s2)
Measures the distance between two Strings.
|
public KendallTauSequenceDistance()
Object class.public KendallTauSequenceDistance(boolean useAlternateAlg)
The KendallTauDistance class provides two algorithms. This constructor enables you to select which algorithm to use.
The default algorithm requires sequence elements to either be primitives (e.g.,
byte, short, int, long, char, float, double, boolean) or to be objects of a class that overrides
the hashCode and equals methods of the Object class.
The alternate algorithm requires sequence elements to either be primitives (e.g.,
byte, short, int, long, char, float, double, boolean) or to be objects of a class that implements
the Comparable interface, and overrides the equals method
of the Object class.
Under most conditions, the preferred algorithm is the default. The alternate algorithm may be desirable if the cost to compare objects is significantly less than the cost to hash objects, or if the objects are of a class that implements Comparable but which does not provide an implementation of hashCode.
useAlternateAlg - To use the alternate algorithm pass true. To use the default algorithm pass false.public int distance(int[] s1,
int[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(long[] s1,
long[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(short[] s1,
short[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(byte[] s1,
byte[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(char[] s1,
char[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(String s1, String s2)
s1 - First String.s2 - Second String.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(float[] s1,
float[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(double[] s1,
double[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(boolean[] s1,
boolean[] s2)
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic int distance(Object[] s1, Object[] s2)
If the distance measurer object is configured, via the constructor, to use the alternate algorithm, but the arrays passed to this method do not implement the Comparable interface, then this method will disregard the choice of alternate algorithm and use the default algorithm instead.
s1 - First array.s2 - Second array.IllegalArgumentException - if sequences are of different lengths, or contain different elementspublic <T> int distance(List<T> s1, List<T> s2)
T - Type of List elements.s1 - First list.s2 - Second list.public final double distancef(long[] s1,
long[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(int[] s1,
int[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(short[] s1,
short[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(byte[] s1,
byte[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(char[] s1,
char[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(double[] s1,
double[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(float[] s1,
float[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(boolean[] s1,
boolean[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final double distancef(String s1, String s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First String.s2 - Second String.public final double distancef(Object[] s1, Object[] s2)
distancef in interface SequenceDistanceMeasurerDoubles1 - First array.s2 - Second array.public final <T> double distancef(List<T> s1, List<T> s2)
distancef in interface SequenceDistanceMeasurerDoubleT - Type of List elements.s1 - First list.s2 - Second list.Copyright © 2005-2020 Vincent A. Cicirello. All rights reserved.