Package no.oslomet.aaas.utils
Class ARXPayloadAnalyser
- java.lang.Object
-
- no.oslomet.aaas.utils.ARXPayloadAnalyser
-
@Component public class ARXPayloadAnalyser extends java.lang.ObjectUtility class analysing the tabular data set against re-identification risk
-
-
Field Summary
Fields Modifier and Type Field Description private static intPRECENT_CONVERT
-
Constructor Summary
Constructors Constructor Description ARXPayloadAnalyser()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Map<java.lang.String,java.lang.String>getPayloadAnalysisData(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a map containing the different statistics found from the data set.doublegetPayloadAverageProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.doublegetPayloadEstimatedJournalistRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.doublegetPayloadEstimatedMarketerRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.doublegetPayloadEstimatedProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.doublegetPayloadHighestProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.doublegetPayloadLowestProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModelgetPayloadPopulationModel(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.doublegetPayloadPopulationUniques(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.java.util.Set<java.lang.String>getPayloadQuasiIdentifiers(org.deidentifier.arx.DataHandle data)Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifyingdoublegetPayloadRecordsAffectByRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel, double risk)Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.doublegetPayloadSampleUniques(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the amount of unique records/fields in the data set.
-
-
-
Field Detail
-
PRECENT_CONVERT
private static final int PRECENT_CONVERT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getPayloadLowestProsecutorRisk
public double getPayloadLowestProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set- Returns:
- lowest risk found in the data set
-
getPayloadRecordsAffectByRisk
public double getPayloadRecordsAffectByRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel, double risk)Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fractionrisk- specific amount of risk that affects one or more records- Returns:
- records affect by a specific amount of risk
-
getPayloadAverageProsecutorRisk
public double getPayloadAverageProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- average risk found in the data set
-
getPayloadHighestProsecutorRisk
public double getPayloadHighestProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- highest risk found in the data set
-
getPayloadEstimatedProsecutorRisk
public double getPayloadEstimatedProsecutorRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- estimated prosecutor risk found in the data set
-
getPayloadEstimatedJournalistRisk
public double getPayloadEstimatedJournalistRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- estimated journalist risk found in the data set
-
getPayloadEstimatedMarketerRisk
public double getPayloadEstimatedMarketerRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- estimated marketer risk found in the data set
-
getPayloadSampleUniques
public double getPayloadSampleUniques(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the amount of unique records/fields in the data set.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- amount of unique records/fields found in the data set
-
getPayloadPopulationUniques
public double getPayloadPopulationUniques(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- amount of unique records/fields found in the data set which are also unique in the population model
-
getPayloadPopulationModel
public org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel getPayloadPopulationModel(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- population model name
-
getPayloadQuasiIdentifiers
public java.util.Set<java.lang.String> getPayloadQuasiIdentifiers(org.deidentifier.arx.DataHandle data)
Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying- Parameters:
data- tabular data set to be analysed against re-identification risk- Returns:
- set of strings containing quasi-identifying fields
-
getPayloadAnalysisData
public java.util.Map<java.lang.String,java.lang.String> getPayloadAnalysisData(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a map containing the different statistics found from the data set.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- a hash map containing data set re-identification statistics
-
-