Class ARXPayloadAnalyser


  • @Component
    public class ARXPayloadAnalyser
    extends java.lang.Object
    Utility class analysing the tabular data set against re-identification risk
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int PRECENT_CONVERT  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Map<java.lang.String,​java.lang.String> getPayloadAnalysisData​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a map containing the different statistics found from the data set.
      double getPayloadAverageProsecutorRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.
      double getPayloadEstimatedJournalistRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.
      double getPayloadEstimatedMarketerRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.
      double getPayloadEstimatedProsecutorRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.
      double getPayloadHighestProsecutorRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.
      double getPayloadLowestProsecutorRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.
      org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel getPayloadPopulationModel​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.
      double getPayloadPopulationUniques​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.
      java.util.Set<java.lang.String> getPayloadQuasiIdentifiers​(org.deidentifier.arx.DataHandle data)
      Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying
      double getPayloadRecordsAffectByRisk​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel, double risk)
      Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.
      double getPayloadSampleUniques​(org.deidentifier.arx.DataHandle data, org.deidentifier.arx.ARXPopulationModel pModel)
      Returns a double that shows the amount of unique records/fields in the data set.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ARXPayloadAnalyser

        public ARXPayloadAnalyser()
    • Method Detail

      • getPayloadLowestProsecutorRisk

        public double getPayloadLowestProsecutorRisk​(org.deidentifier.arx.DataHandle data,
                                                     org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the lowest prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set
        Returns:
        lowest risk found in the data set
      • getPayloadRecordsAffectByRisk

        public double getPayloadRecordsAffectByRisk​(org.deidentifier.arx.DataHandle data,
                                                    org.deidentifier.arx.ARXPopulationModel pModel,
                                                    double risk)
        Returns a double that shows the amount of records/fields that are affected by a specific amount of risk.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        risk - specific amount of risk that affects one or more records
        Returns:
        records affect by a specific amount of risk
      • getPayloadAverageProsecutorRisk

        public double getPayloadAverageProsecutorRisk​(org.deidentifier.arx.DataHandle data,
                                                      org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the average prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        average risk found in the data set
      • getPayloadHighestProsecutorRisk

        public double getPayloadHighestProsecutorRisk​(org.deidentifier.arx.DataHandle data,
                                                      org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the highest prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        highest risk found in the data set
      • getPayloadEstimatedProsecutorRisk

        public double getPayloadEstimatedProsecutorRisk​(org.deidentifier.arx.DataHandle data,
                                                        org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the estimated prosecutor re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        estimated prosecutor risk found in the data set
      • getPayloadEstimatedJournalistRisk

        public double getPayloadEstimatedJournalistRisk​(org.deidentifier.arx.DataHandle data,
                                                        org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the estimated journalist re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        estimated journalist risk found in the data set
      • getPayloadEstimatedMarketerRisk

        public double getPayloadEstimatedMarketerRisk​(org.deidentifier.arx.DataHandle data,
                                                      org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the estimated marketer re-identification risk found in the data set, based on the population model that is defined.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        estimated marketer risk found in the data set
      • getPayloadSampleUniques

        public double getPayloadSampleUniques​(org.deidentifier.arx.DataHandle data,
                                              org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the amount of unique records/fields in the data set.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        amount of unique records/fields found in the data set
      • getPayloadPopulationUniques

        public double getPayloadPopulationUniques​(org.deidentifier.arx.DataHandle data,
                                                  org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a double that shows the amount of unique records/fields in the data set, which are also unique within the underlying population model from which the data is a part of.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        amount of unique records/fields found in the data set which are also unique in the population model
      • getPayloadPopulationModel

        public org.deidentifier.arx.risk.RiskModelPopulationUniqueness.PopulationUniquenessModel getPayloadPopulationModel​(org.deidentifier.arx.DataHandle data,
                                                                                                                           org.deidentifier.arx.ARXPopulationModel pModel)
        Returns the method name used to estimating population uniqueness that assumes that the data set is a uniform sample of the population.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        population model name
      • getPayloadQuasiIdentifiers

        public java.util.Set<java.lang.String> getPayloadQuasiIdentifiers​(org.deidentifier.arx.DataHandle data)
        Returns a set of strings that contains field names from the data set that has an attribute type of quasi-identifying
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        Returns:
        set of strings containing quasi-identifying fields
      • getPayloadAnalysisData

        public java.util.Map<java.lang.String,​java.lang.String> getPayloadAnalysisData​(org.deidentifier.arx.DataHandle data,
                                                                                             org.deidentifier.arx.ARXPopulationModel pModel)
        Returns a map containing the different statistics found from the data set.
        Parameters:
        data - tabular data set to be analysed against re-identification risk
        pModel - population model for our data set that defines the population size and sampling fraction
        Returns:
        a hash map containing data set re-identification statistics