Package no.oslomet.aaas.analyzer
Class ARXReIdentificationRiskFactory
- java.lang.Object
-
- no.oslomet.aaas.analyzer.ARXReIdentificationRiskFactory
-
@Component public class ARXReIdentificationRiskFactory extends java.lang.ObjectUtility class analysing the tabular data set against re-identification risk
-
-
Field Summary
Fields Modifier and Type Field Description private static intPRECENT_CONVERTprivate static doubleTHRESHOLD
-
Constructor Summary
Constructors Modifier Constructor Description privateARXReIdentificationRiskFactory()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description private static doubleaverageProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.static ReIdentificationRiskcreate(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)private static doubleestimatedJournalistRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.private static doubleestimatedMarketerRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.private static doubleestimatedProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.private static doublehighestJournalistRisk(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)Returns a double that shows the highest journalist re-identification risk found in the data set, based on the population model that is defined.private static doublehighestProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.private static doublejournalistAttackerSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)Returns a double taht shows the success rate of a journalist riskprivate static doublelowestProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.private static doublemarketerAttackerSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)Returns a double taht shows the success rate of a marketer riskprivate static org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModelpopulationUniquenessModel(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.private static doublepopulationUniques(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.private static doubleprosecutorAttackSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)Returns a double that shows the Success rate of a prosecutor riskprivate static java.util.Set<java.lang.String>quasiIdentifiers(org.deidentifier.arx.DataHandle data)Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifyingprivate static doublerecordsAffectByRisk(org.deidentifier.arx.risk.RiskModelSampleRiskDistribution sampleRiskDistribution, double risk)Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.private static java.util.Map<java.lang.String,java.lang.String>reIdentificationRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a map containing the different statistics found from the data set.private static doublesampleUniques(org.deidentifier.arx.risk.RiskModelSampleUniqueness riskModelSampleUniqueness)Returns a double that shows the amount of unique records/fields in the data set.
-
-
-
Field Detail
-
PRECENT_CONVERT
private static final int PRECENT_CONVERT
- See Also:
- Constant Field Values
-
THRESHOLD
private static final double THRESHOLD
- See Also:
- Constant Field Values
-
-
Method Detail
-
create
public static ReIdentificationRisk create(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
-
reIdentificationRisk
private static java.util.Map<java.lang.String,java.lang.String> reIdentificationRisk(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)Returns a map containing the different statistics found from the data set.- Parameters:
data- tabular data set to be analysed against re-identification riskpModel- population model for our data set that defines the population size and sampling fraction- Returns:
- a hash map containing data set re-identification statistics
-
lowestProsecutorRisk
private static double lowestProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- lowest risk found in the data set
-
recordsAffectByRisk
private static double recordsAffectByRisk(org.deidentifier.arx.risk.RiskModelSampleRiskDistribution sampleRiskDistribution, double risk)Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.- Parameters:
sampleRiskDistribution- RiskModelSampleRiskDistribution for the datasetrisk- specific amount of risk that affects one or more records- Returns:
- records affect by a specific amount of risk
-
averageProsecutorRisk
private static double averageProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- average risk found in the data set
-
highestProsecutorRisk
private static double highestProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- highest prosecutor risk found in the data set
-
highestJournalistRisk
private static double highestJournalistRisk(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
Returns a double that shows the highest journalist re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleSummary- containing summary of the dataset risks- Returns:
- highest journalist risk found in the data set
-
estimatedProsecutorRisk
private static double estimatedProsecutorRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- estimated prosecutor risk found in the data set
-
estimatedJournalistRisk
private static double estimatedJournalistRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- estimated journalist risk found in the data set
-
estimatedMarketerRisk
private static double estimatedMarketerRisk(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.- Parameters:
riskModelSampleRisks- SampleRisks for the dataset- Returns:
- estimated marketer risk found in the data set
-
sampleUniques
private static double sampleUniques(org.deidentifier.arx.risk.RiskModelSampleUniqueness riskModelSampleUniqueness)
Returns a double that shows the amount of unique records/fields in the data set.- Parameters:
riskModelSampleUniqueness- RiskModelSampleUniqueness for the dataset- Returns:
- amount of unique records/fields found in the data set
-
populationUniques
private static double populationUniques(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.- Parameters:
riskModelPopulationUniqueness- RiskModelPopulationUniqueness for the dataset- Returns:
- amount of unique records/fields found in the data set which are also unique in the population model
-
populationUniquenessModel
private static org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel populationUniquenessModel(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.- Parameters:
riskModelPopulationUniqueness- RiskModelPopulationUniqueness for the dataset- Returns:
- PopulationUniquenessModel for det dataset
-
quasiIdentifiers
private static java.util.Set<java.lang.String> quasiIdentifiers(org.deidentifier.arx.DataHandle data)
Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying- Parameters:
data- tabular data set to be analysed against re-identification risk- Returns:
- set of strings containing quasi-identifying fields
-
prosecutorAttackSuccessRate
private static double prosecutorAttackSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
Returns a double that shows the Success rate of a prosecutor risk- Parameters:
riskModelSampleSummary- containing summary of the dataset risks- Returns:
- attacker success rate of a prosecutor risk
-
journalistAttackerSuccessRate
private static double journalistAttackerSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
Returns a double taht shows the success rate of a journalist risk- Parameters:
riskModelSampleSummary- containing summary of the dataset risks- Returns:
- attacker success rate of a journalist risk
-
marketerAttackerSuccessRate
private static double marketerAttackerSuccessRate(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
Returns a double taht shows the success rate of a marketer risk- Parameters:
riskModelSampleSummary- containing summary of the dataset risks- Returns:
- attacker success rate of a marketer risk
-
-