Package ciir.umass.edu.features
Class FeatureManager
- java.lang.Object
-
- ciir.umass.edu.features.FeatureManager
-
public class FeatureManager extends java.lang.Object
-
-
Constructor Summary
Constructors Constructor Description FeatureManager()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int[]getFeatureFromSampleVector(java.util.List<RankList> samples)Obtain all features present in a sample set.static voidmain(java.lang.String[] args)static voidprepareCV(java.util.List<RankList> samples, int nFold, float tvs, java.util.List<java.util.List<RankList>> trainingData, java.util.List<java.util.List<RankList>> validationData, java.util.List<java.util.List<RankList>> testData)Split the input sample set into k chunks (folds) of roughly equal size and create train/test data for each fold.static voidprepareCV(java.util.List<RankList> samples, int nFold, java.util.List<java.util.List<RankList>> trainingData, java.util.List<java.util.List<RankList>> testData)Split the input sample set into k chunks (folds) of roughly equal size and create train/test data for each fold.static voidprepareSplit(java.util.List<RankList> samples, double percentTrain, java.util.List<RankList> trainingData, java.util.List<RankList> testData)Split the input sample set into 2 chunks: one for training and one for either validation or testingstatic voidprintQueriesForSplit(java.lang.String name, java.util.List<java.util.List<RankList>> split)static int[]readFeature(java.lang.String featureDefFile)Read features specified in an input feature file.static java.util.List<RankList>readInput(java.lang.String inputFile)Read a set of rankings from a single file.static java.util.List<RankList>readInput(java.lang.String inputFile, boolean mustHaveRelDoc, boolean useSparseRepresentation)Read a set of rankings from a single file.static java.util.List<RankList>readInput(java.util.List<java.lang.String> inputFiles)Read sets of rankings from multiple files.static voidsave(java.util.List<RankList> samples, java.lang.String outputFile)Save a sample set to file
-
-
-
Method Detail
-
main
public static void main(java.lang.String[] args)
- Parameters:
args-
-
readInput
public static java.util.List<RankList> readInput(java.lang.String inputFile)
Read a set of rankings from a single file.- Parameters:
inputFile-- Returns:
-
readInput
public static java.util.List<RankList> readInput(java.lang.String inputFile, boolean mustHaveRelDoc, boolean useSparseRepresentation)
Read a set of rankings from a single file.- Parameters:
inputFile-mustHaveRelDoc-useSparseRepresentation-- Returns:
-
readInput
public static java.util.List<RankList> readInput(java.util.List<java.lang.String> inputFiles)
Read sets of rankings from multiple files. Then merge them altogether into a single ranking.- Parameters:
inputFiles-- Returns:
-
readFeature
public static int[] readFeature(java.lang.String featureDefFile)
Read features specified in an input feature file. Expecting one feature per line.- Parameters:
featureDefFile-- Returns:
-
getFeatureFromSampleVector
public static int[] getFeatureFromSampleVector(java.util.List<RankList> samples)
Obtain all features present in a sample set.- Parameters:
samples-- Returns:
-
prepareCV
public static void prepareCV(java.util.List<RankList> samples, int nFold, java.util.List<java.util.List<RankList>> trainingData, java.util.List<java.util.List<RankList>> testData)
Split the input sample set into k chunks (folds) of roughly equal size and create train/test data for each fold. Note that NO randomization is done. If you want to randomly split the data, make sure that you randomize the order in the input samples prior to calling this function.- Parameters:
samples-nFold-trainingData-testData-
-
prepareCV
public static void prepareCV(java.util.List<RankList> samples, int nFold, float tvs, java.util.List<java.util.List<RankList>> trainingData, java.util.List<java.util.List<RankList>> validationData, java.util.List<java.util.List<RankList>> testData)
Split the input sample set into k chunks (folds) of roughly equal size and create train/test data for each fold. Then it further splits the training data in each fold into train and validation. Note that NO randomization is done. If you want to randomly split the data, make sure that you randomize the order in the input samples prior to calling this function.- Parameters:
samples-nFold-tvs- Train/validation split ratiotrainingData-validationData-testData-
-
printQueriesForSplit
public static void printQueriesForSplit(java.lang.String name, java.util.List<java.util.List<RankList>> split)
-
prepareSplit
public static void prepareSplit(java.util.List<RankList> samples, double percentTrain, java.util.List<RankList> trainingData, java.util.List<RankList> testData)
Split the input sample set into 2 chunks: one for training and one for either validation or testing- Parameters:
samples-percentTrain- The percentage of data used for trainingtrainingData-testData-
-
save
public static void save(java.util.List<RankList> samples, java.lang.String outputFile)
Save a sample set to file- Parameters:
samples-outputFile-
-
-