Class ARXReIdentificationRiskFactory


  • @Component
    public class ARXReIdentificationRiskFactory
    extends java.lang.Object
    Utility class analysing the tabular data set against re-identification risk
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int PRECENT_CONVERT  
      private static double THRESHOLD  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static double averageProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.
      static ReIdentificationRisk create​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)  
      private static double estimatedJournalistRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.
      private static double estimatedMarketerRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.
      private static double estimatedProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.
      private static double highestJournalistRisk​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
      Returns a double that shows the highest journalist re-identification risk found in the data set, based on the population model that is defined.
      private static double highestProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.
      private static double journalistAttackerSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
      Returns a double taht shows the success rate of a journalist risk
      private static double lowestProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
      Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.
      private static double marketerAttackerSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
      Returns a double taht shows the success rate of a marketer risk
      private static org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel populationUniquenessModel​(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
      Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.
      private static double populationUniques​(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
      Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.
      private static double prosecutorAttackSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
      Returns a double that shows the Success rate of a prosecutor risk
      private static java.util.Set<java.lang.String> quasiIdentifiers​(org.deidentifier.arx.DataHandle data)
      Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying
      private static double recordsAffectByRisk​(org.deidentifier.arx.risk.RiskModelSampleRiskDistribution sampleRiskDistribution, double risk)
      Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.
      private static java.util.Map<java.lang.String,​java.lang.String> reIdentificationRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a map containing the different statistics found from the data set.
      private static double sampleUniques​(org.deidentifier.arx.risk.RiskModelSampleUniqueness riskModelSampleUniqueness)
      Returns a double that shows the amount of unique records/fields in the data set.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ARXReIdentificationRiskFactory

        private ARXReIdentificationRiskFactory()
    • Method Detail

      • create

        public static ReIdentificationRisk create​(org.deidentifier.arx.DataHandle data,
                                                  org.deidentifier.arx.ARXPopulationModel pModel)
      • reIdentificationRisk

        private static java.util.Map<java.lang.String,​java.lang.String> reIdentificationRisk​(org.deidentifier.arx.DataHandle data,
                                                                                                   org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a map containing the different statistics found from the data set.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        a hash map containing data set re-identification statistics
      • lowestProsecutorRisk

        private static double lowestProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        lowest risk found in the data set
      • recordsAffectByRisk

        private static double recordsAffectByRisk​(org.deidentifier.arx.risk.RiskModelSampleRiskDistribution sampleRiskDistribution,
                                                  double risk)
        Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.
        Parameters:
        sampleRiskDistribution - RiskModelSampleRiskDistribution for the dataset
        risk - specific amount of risk that affects one or more records
        Returns:
        records affect by a specific amount of risk
      • averageProsecutorRisk

        private static double averageProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        average risk found in the data set
      • highestProsecutorRisk

        private static double highestProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        highest prosecutor risk found in the data set
      • highestJournalistRisk

        private static double highestJournalistRisk​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
        Returns a double that shows the highest journalist re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleSummary - containing summary of the dataset risks
        Returns:
        highest journalist risk found in the data set
      • estimatedProsecutorRisk

        private static double estimatedProsecutorRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        estimated prosecutor risk found in the data set
      • estimatedJournalistRisk

        private static double estimatedJournalistRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        estimated journalist risk found in the data set
      • estimatedMarketerRisk

        private static double estimatedMarketerRisk​(org.deidentifier.arx.risk.RiskModelSampleRisks riskModelSampleRisks)
        Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        riskModelSampleRisks - SampleRisks for the dataset
        Returns:
        estimated marketer risk found in the data set
      • sampleUniques

        private static double sampleUniques​(org.deidentifier.arx.risk.RiskModelSampleUniqueness riskModelSampleUniqueness)
        Returns a double that shows the amount of unique records/fields in the data set.
        Parameters:
        riskModelSampleUniqueness - RiskModelSampleUniqueness for the dataset
        Returns:
        amount of unique records/fields found in the data set
      • populationUniques

        private static double populationUniques​(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
        Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.
        Parameters:
        riskModelPopulationUniqueness - RiskModelPopulationUniqueness for the dataset
        Returns:
        amount of unique records/fields found in the data set which are also unique in the population model
      • populationUniquenessModel

        private static org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel populationUniquenessModel​(org.deidentifier.arx.risk.RiskModelPopulationUniqueness riskModelPopulationUniqueness)
        Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.
        Parameters:
        riskModelPopulationUniqueness - RiskModelPopulationUniqueness for the dataset
        Returns:
        PopulationUniquenessModel for det dataset
      • quasiIdentifiers

        private static java.util.Set<java.lang.String> quasiIdentifiers​(org.deidentifier.arx.DataHandle data)
        Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        Returns:
        set of strings containing quasi-identifying fields
      • prosecutorAttackSuccessRate

        private static double prosecutorAttackSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
        Returns a double that shows the Success rate of a prosecutor risk
        Parameters:
        riskModelSampleSummary - containing summary of the dataset risks
        Returns:
        attacker success rate of a prosecutor risk
      • journalistAttackerSuccessRate

        private static double journalistAttackerSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
        Returns a double taht shows the success rate of a journalist risk
        Parameters:
        riskModelSampleSummary - containing summary of the dataset risks
        Returns:
        attacker success rate of a journalist risk
      • marketerAttackerSuccessRate

        private static double marketerAttackerSuccessRate​(org.deidentifier.arx.risk.RiskModelSampleSummary riskModelSampleSummary)
        Returns a double taht shows the success rate of a marketer risk
        Parameters:
        riskModelSampleSummary - containing summary of the dataset risks
        Returns:
        attacker success rate of a marketer risk