org.pmml4s.model
PMML is a standard for XML documents which express trained instances of analytic models. The following classes of model are addressed:
- Association Rules, implemented by org.pmml4s.model.AssociationModel
- Baseline Models, NOT IMPLEMENTED
- Bayesian Network, NOT IMPLEMENTED
- Center-Based & Distribution-Based Clustering, implemented by org.pmml4s.model.ClusteringModel
- Gaussian Process, NOT IMPLEMENTED
- General Regression, implemented by org.pmml4s.model.GeneralRegressionModel
- k-Nearest Neighbors, implemented by org.pmml4s.model.NearestNeighborModel
- Naive Bayes, implemented by org.pmml4s.model.NaiveBayesModel
- Neural Networks, implemented by org.pmml4s.model.NeuralNetwork
- Regression, implemented by org.pmml4s.model.RegressionModel
- Ruleset, implemented by org.pmml4s.model.RuleSetModel
- Scorecard, implemented by org.pmml4s.model.Scorecard
- Sequences, NOT IMPLEMENTED
- Text, NOT IMPLEMENTED
- Time Series, NOT IMPLEMENTED
- Decision Trees, implemented by org.pmml4s.model.TreeModel
- Support Vector Machine, implemented by org.pmml4s.model.SupportVectorMachineModel
Type members
Classlikes
Defines model types used by the anomaly model.
Defines model types used by the anomaly model.
Holds attributes of an Anomaly Detection Model.
Holds attributes of an Anomaly Detection Model.
Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set. Traditional approaches comprise of distance and density-based approaches. Examples of common ways to define distance or density are distance to the k-nearest neighbors or count of points within a given fixed radius. These methods however are unable to handle data sets with regions of different densities and do not scale well for large data. Other algorithms have been proposed which are better able to handle such cases; the PMML standard at this time supports three such algorithms:
Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set. Traditional approaches comprise of distance and density-based approaches. Examples of common ways to define distance or density are distance to the k-nearest neighbors or count of points within a given fixed radius. These methods however are unable to handle data sets with regions of different densities and do not scale well for large data. Other algorithms have been proposed which are better able to handle such cases; the PMML standard at this time supports three such algorithms:
- Isolation Forest
- One Class SVM
- Clustering mean distance based anomaly detection model
- Other models can also be used if their scoring follows PMML standard rules.
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product or set of products is often bought in combination with a certain set of other products, also known as Market Basket Analysis. An Association Rule model typically has two variables: one for grouping records together into transactions (usageType="group") and another that uniquely identifies each record (usageType="active"). Alternatively, association rule models can be built on regular data, where each category of each categorical field is an item. Yet another possible format of data is a table with true/false values, where only the fields having true value in a record are considered valid items.
The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product or set of products is often bought in combination with a certain set of other products, also known as Market Basket Analysis. An Association Rule model typically has two variables: one for grouping records together into transactions (usageType="group") and another that uniquely identifies each record (usageType="active"). Alternatively, association rule models can be built on regular data, where each category of each categorical field is an item. Yet another possible format of data is a table with true/false values, where only the fields having true value in a record are considered valid items.
An Association Rule model consists of four major parts:
- Model attributes
- Items
- ItemSets
- AssociationRules
We consider association rules of the form "
We consider association rules of the form "
- Value parameters:
- affinity
Also known as Jaccard Similarity, affinity is a measure of the transactions that contain both the antecedent and consequent (intersect) compared to those that contain the antecedent or the consequent (union): affinity(A->C) = support(A+C) / [ support(A) + support(C) - support(A+C)]
- antecedent
The id value of the itemset which is the antecedent of the rule. We represent the itemset by the letter A.
- confidence
The confidence of the rule: confidence(A->C) = support(A+C) / support(A)
- consequent
The id value of the itemset which is the consequent of the rule. We represent the itemset by the letter C.
- id
An identification to uniquely identify an association rule.
- leverage
Another measure of interestingness is leverage. An association with higher frequency and lower lift may be more interesting than an alternative rule with lower frequency and higher lift. The former can be more important in practice because it applies to more cases. The value is the difference between the observed frequency of A+C and the frequency that would be expected if A and C were independent: leverage(A->C) = support(A->C) - support(A)*support(C)
- lift
A very popular measure of interestingness of a rule is lift. Lift values greater than 1.0 indicate that transactions containing A tend to contain C more often than transactions that do not contain A: lift(A->C) = confidence(A->C) / support(C)
- support
The support of the rule, that is, the relative frequency of transactions that contain A and C: support(A->C) = support(A+C)
Defines input attributes for each scorecard characteristic are defined in terms of predicates. For numeric characteristics, predicates are used to implement the mapping from a range of continuous values to a partial score . For example, age range 20 to 29 may map to partial score "15". For categorical characteristics, predicates are used to implement the mapping of categorical values to partial scores. Note that while predicates will not (typically) overlap, the Scoring Procedure requires the ordering of Attributes to be respected, and that the first matching Attribute shall determine the partial scored value.
Defines input attributes for each scorecard characteristic are defined in terms of predicates. For numeric characteristics, predicates are used to implement the mapping from a range of continuous values to a partial score . For example, age range 20 to 29 may map to partial score "15". For categorical characteristics, predicates are used to implement the mapping of categorical values to partial scores. Note that while predicates will not (typically) overlap, the Scoring Procedure requires the ordering of Attributes to be respected, and that the first matching Attribute shall determine the partial scored value.
- Value parameters:
- complexPartialScore
Used to implement complex point allocation of the score points awarded to the Attribute . To be used in lieu of attribute partialScore. If both are defined, element ComplexPartialScore takes precedence over attribute partialScore for computing the score points awarded to the Attribute. Whenever element ComplexPartialScore is used, the actual partial score is the value returned by the EXPRESSION (see Transformations for more information).
- partialScore
Defines the score points awarded to the Attribute. Note that attribute partialScore is optional. A partial score is required though to be specified for every Attribute. Either it needs to be defined through the partialScore attribute or through the ComplexPartialScore element as defined below.
- predicate
The condition upon which the mapping between input attribute and partial score takes place. For more details on PREDICATE see the section on predicates in TreeModel for an explanation on how predicates are described and evaluated. In scorecard models, all the predicates defining the Attributes for a particular Characteristic must all reference a single field.
- reasonCode
Defines the attribute's reason code. If the reasonCode attribute is used in this level, it takes precedence over the reasonCode attribute associated with the Characteristic element.
An informational string describing the technique used by the model designer to establish the baseline scores. Allowed values are:
An informational string describing the technique used by the model designer to establish the baseline scores. Allowed values are:
- max: Indicates that baseline scores are the maximum partial score in element Characteristic
- min: Baseline scores are the minimum partial score in Characteristic
- mean: Baseline scores are the mean (weighted average) partial score in Characteristic
- neutral: Baseline scores are the risk-neutral partial score in Characteristic
- other: Baseline scores are derived using any other technique.
This attribute is purely informational and does not influence the runtime calculations of reason codes. (By contrast, the reasonCodeAlgorithm is critical to achieving an accurate calculation of reasons.)
For a discrete field, each BayesInput contains the counts pairing the discrete values of that field with those of the target field. For a continuous field, the BayesInput element lists the distributions obtained for that field with each value of the target field. BayesInput may also be used to define how continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here).
For a discrete field, each BayesInput contains the counts pairing the discrete values of that field with those of the target field. For a continuous field, the BayesInput element lists the distributions obtained for that field with each value of the target field. BayesInput may also be used to define how continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here).
Note that a BayesInput element encompasses either one TargetValueStats element or one or more PairCounts elements. Element DerivedField can only be used in conjunction with PairCounts.
Contains several BayesInput elements.
Contains several BayesInput elements.
Contains the counts associated with the values of the target field.
Contains the counts associated with the values of the target field.
Defines the point allocation strategy for each scorecard characteristic (numeric or categorical). Once point allocation between input attributes and partial scores takes place, each scorecard characteristic is assigned a single partial score which is used to compute the overall score. The overall score is simply the sum of all partial scores. Partial scores are assumed to be continuous values of type "double".
Defines the point allocation strategy for each scorecard characteristic (numeric or categorical). Once point allocation between input attributes and partial scores takes place, each scorecard characteristic is assigned a single partial score which is used to compute the overall score. The overall score is simply the sum of all partial scores. Partial scores are assumed to be continuous values of type "double".
- Value parameters:
- attributes
Input attributes for each scorecard characteristic are defined in terms of predicates.
- baselineScore
Sets the characteristic's baseline score against which to compare the actual partial score when determining the ranking of reason codes. This attribute is required when useReasonCodes attribute is "true" and attribute baselineScore is not defined in element Scorecard. Whenever baselineScore is defined for a Characteristic, it takes precedence over the baselineScore attribute value defined in element Scorecard. Note that the design-time technique used to determine the baseline scores is captured in the baselineMethod attribute.
- name
Name of the characteristic. For informational reasons only.
- reasonCode
Contains the characteristic's reason code, which will be later mapped to a business reason usually associated with an adverse decision.
Envelopes for all scorecard characteristics.
Envelopes for all scorecard characteristics.
A cluster is defined by its center vector or by statistics. A center vector is implemented by a NUM-ARRAY. Each Partition corresponds to a cluster and holds field statistics to describe it. The definition of a cluster may contain a center vector as well as statistics. The attribute modelClass in the ClusteringModel defines which one is used to actually define the cluster.
A cluster is defined by its center vector or by statistics. A center vector is implemented by a NUM-ARRAY. Each Partition corresponds to a cluster and holds field statistics to describe it. The definition of a cluster may contain a center vector as well as statistics. The attribute modelClass in the ClusteringModel defines which one is used to actually define the cluster.
- Value parameters:
- compareFunction
A function of taking two field values and a similarityScale to define similarity/distance. It can override the general specification of compareFunction in ComparisonMeasure.
- comparisons
A matrix which contains the similarity values or distance values.
- field
Refers (by name) to a MiningField or to a DerivedField.
- fieldWeight
The importance factor for the field. This field weight is used in the comparison functions in order to compute the comparison measure. The value must be a number greater than 0. The default value is 1.0.
- isCenterField
Indicates whether the respective field is a center field, i.e. a component of the center, in a center-based model. Only center fields correspond to the entries in the center vectors in order.
- similarityScale
The distance such that similarity becomes 0.5.
A cluster model basically consists of a set of clusters. For each cluster a center vector can be given. In center-based models a cluster is defined by a vector of center coordinates. Some distance measure is used to determine the nearest center, that is the nearest cluster for a given input record. For distribution-based models (e.g., in demographic clustering) the clusters are defined by their statistics. Some similarity measure is used to determine the best matching cluster for a given record. The center vectors then only approximate the clusters.
A cluster model basically consists of a set of clusters. For each cluster a center vector can be given. In center-based models a cluster is defined by a vector of center coordinates. Some distance measure is used to determine the nearest center, that is the nearest cluster for a given input record. For distribution-based models (e.g., in demographic clustering) the clusters are defined by their statistics. Some similarity measure is used to determine the best matching cluster for a given record. The center vectors then only approximate the clusters.
Coefficient αi is described
Coefficient αi is described
Used to store the support vector coefficients αi and b.
Used to store the support vector coefficients αi and b.
- Value parameters:
- absoluteValue
Contains the value of the absolute coefficient b.
Comparisons is a matrix which contains the similarity values or distance values, depending on the attribute modelClass in ClusteringModel. The order of the rows and columns corresponds to the order of discrete values or intervals in that field.
Comparisons is a matrix which contains the similarity values or distance values, depending on the attribute modelClass in ClusteringModel. The order of the rows and columns corresponds to the order of discrete values or intervals in that field.
Defines ComplexPartialScore, the actual partial score is the value returned by the EXPRESSION (see org.pmml4s .transformations for more information).
Defines ComplexPartialScore, the actual partial score is the value returned by the EXPRESSION (see org.pmml4s .transformations for more information).
CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism.
CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism.
- Value parameters:
- predicate
the condition upon which the rule fires.
- rules
One or more rules that are contained within the CompoundRule. Each of these rules may be a SimpleRule or a CompoundRule.
Defines the connections coming into that parent element. The neuron identified by from may be part of any layer.
Defines the connections coming into that parent element. The neuron identified by from may be part of any layer.
Stores coordinate-by-coordinate variances (diagonal cells) and covariances (non-diagonal cells).
Stores coordinate-by-coordinate variances (diagonal cells) and covariances (non-diagonal cells).
List of covariate names. Will not be present when there is no covariate. Each name in the list must match a DataField name or a DerivedField name. The covariates will be treated as continuous variables.
List of covariate names. Will not be present when there is no covariate. Each name in the list must match a DataField name or a DerivedField name. The covariates will be treated as continuous variables.
Definition is used for specifying a cumulative link function used in ordinalMultinomial model.
Definition is used for specifying a cumulative link function used in ordinalMultinomial model.
DataModel is a container for all info about metadata, it's the parent model of all predictive models.
DataModel is a container for all info about metadata, it's the parent model of all predictive models.
The probability distribution of the dependent variable for generalizedLinear model.
The probability distribution of the dependent variable for generalizedLinear model.
List of factor (categorical predictor) names. Not present if this particular regression flavor does not support factors (ex. linear regression). If present, the list may or may not be empty. Each name in the list must match a DataField name or a DerivedField name. The factors must be categorical variables.
List of factor (categorical predictor) names. Not present if this particular regression flavor does not support factors (ex. linear regression). If present, the list may or may not be empty. Each name in the list must match a DataField name or a DerivedField name. The factors must be categorical variables.
Specifies the type of regression model in use. This information will be used to select the appropriate mathematical formulas during scoring.
Specifies the type of regression model in use. This information will be used to select the appropriate mathematical formulas during scoring.
Definition of a general regression model. As the name says it, this is intended to support a multitude of regression models.
Definition of a general regression model. As the name says it, this is intended to support a multitude of regression models.
- Value parameters:
- column
Defines the name of the tag or column used by element InlineTable. This attribute is required if element InlineTable is used to represent training data.
- field
Contains the name of a DataField or a DerivedField (in case isTransformed is set to "true"). Can also contain the name of the case ID variable.
Serves as an envelope for all the fields included in the training instances. It encapsulates InstanceField elements.
Serves as an envelope for all the fields included in the training instances. It encapsulates InstanceField elements.
Obviously the id of an Item must be unique. Furthermore the Item values must be unique, or if they are not unique then attributes field and category must distiguish them. That is, an AssocationModel must not have different instances of Item where the values of the value, field, and category attribute are all the same. The entries in mappedValue may be the same, though. Here are some examples of Items:
Obviously the id of an Item must be unique. Furthermore the Item values must be unique, or if they are not unique then attributes field and category must distiguish them. That is, an AssocationModel must not have different instances of Item where the values of the value, field, and category attribute are all the same. The entries in mappedValue may be the same, though. Here are some examples of Items:
- Value parameters:
- id
An identification to uniquely identify an item.
- mappedValue
Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
- value
The value of the item as in the input data.
- weight
The weight of the item. For example, the price or value of an item.
Item references point to elements of type Item
Item references point to elements of type Item
- Value parameters:
- itemRef
Contains the identification of an item.
- Value parameters:
- id
An identification to uniquely identify an Itemset.
- itemRefs
Item references point to elements of type Item
- numberOfItems
The number of Items contained in this Itemset
- support
The relative support of the Itemset: support(set) = (number of transactions containing the set) / (total number of transactions)
- Value parameters:
- field
Contains the name of a DataField or a DerivedField. If a DerivedField is used and isTransformed is false, the training instances will also need to be transformed together with the k-NN input.
- fieldWeight
Defines the importance factor for the field. It is used in the comparison functions to compute the comparison measure. The value must be a number greater than 0. The default value is 1.0.
encapsulates several KNNInput elements which define the fields used to query the k-NN model, one KNNInput element per field.
encapsulates several KNNInput elements which define the fields used to query the k-NN model, one KNNInput element per field.
The element KohonenMap is appropriate for clustering models that were produced by a Kohonen map algorithm. The attributes coord1, coord2 and coord3 describe the position of the current cluster in a map with up to three dimensions. This element is not relevant to the scoring function.
The element KohonenMap is appropriate for clustering models that were produced by a Kohonen map algorithm. The attributes coord1, coord2 and coord3 describe the position of the current cluster in a map with up to three dimensions. This element is not relevant to the scoring function.
Linear basis functions which lead to a hyperplane as classifier. K(x,y) = <x,y>
Linear basis functions which lead to a hyperplane as classifier. K(x,y) = <x,y>
Definition is used for specifies the type of link function to use when generalizedLinear model type is specified.
Definition is used for specifies the type of link function to use when generalizedLinear model type is specified.
Contains an array of non-negative real values, it is required when the algorithm type is clusterMeanDist. The length of the array must equal the number of clusters in the model, and the values in it are the mean distances/similarities to the center for each cluster.
Contains an array of non-negative real values, it is required when the algorithm type is clusterMeanDist. The length of the array must equal the number of clusters in the model, and the values in it are the mean distances/similarities to the center for each cluster.
The element MiningModel allows precise specification of the usage of multiple models within one PMML file. The two main approaches are Model Composition, and Segmentation.
The element MiningModel allows precise specification of the usage of multiple models within one PMML file. The two main approaches are Model Composition, and Segmentation.
Model Composition includes model sequencing and model selection but is only applicable to Tree and Regression models. Segmentation allows representation of different models for different data segments and also can be used for model ensembles and model sequences. Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods. Scoring a case using a sequence, or chain, of models allows the output of one model to be passed in as input to subsequent models.
ModelComposition uses "embedded model elements" that are defeatured copies of "standalone model elements" -- specifically, Regression for RegressionModel, DecisionTree for TreeModel. Besides being limited to Regression and Tree models, these embedded model elements lack key features like a MiningSchema (essential to manage scope across multiple model elements). Therefore, in PMML 4.2, the Model Composition approach has been deprecated since the Segmentation approach allows for a wider range of models to be used more reliably. For more on deprecation, see Conformance.
Segmentation is accomplished by using any PMML model element inside of a Segment element, which also contains a PREDICATE and an optional weight. MiningModel then contains Segmentation element with a number of Segment elements as well as the attribute multipleModelMethod specifying how all the models applicable to a record should be combined. It is also possible to use a combination of model composition and segmentation approaches, using simple regression or decision trees for data preprocessing before segmentation.
The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result. The attribute missingThreshold is closely related and has default value 1. The options are defined as follows:
The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result. The attribute missingThreshold is closely related and has default value 1. The options are defined as follows:
- returnMissing means that if at least one model has a missing result, the whole MiningModel's result should be missing.
- skipSegment says that if a model has a missing result, that segment is ignored and the results are computed based on other segments. However, if the fraction of the models with missing results ( weighted if the model combination method is weighted ) exceeds the missingThreshold, the returned result must be missing. This option should not be used with modelChain combination method.
- continue says that if a model has a missing result, the processing should continue normally. This can work well for voting or modelChain situations, as well as returnFirst and returnAll. In case of majorityVote or weightedMajorityVote the missing result can be returned if it gets the most ( possibly weighted ) votes, or if the fraction of the models with missing result exceeds the missingThreshold. Otherwise a valid result is computed normally. Other model combination methods will return a missing value as the result.
Defines a strategy for dealing with missing values.
Defines a strategy for dealing with missing values.
MissingValueWeights is used to adjust distance or similarity measures for missing data.
MissingValueWeights is used to adjust distance or similarity measures for missing data.
Abstract class that represents a PMML model
Abstract class that represents a PMML model
- Companion:
- object
Specifying how all the models applicable to a record should be combined.
Specifying how all the models applicable to a record should be combined.
A normalization method softmax ( pj = exp(yj) / Sumi(exp(yi) ) ) or simplemax ( pj = yj / Sumi(yi) ) can be applied to the computed activation values. The attribute normalizationMethod is defined for the network with default value none ( pj = yj ), but can be specified for each layer as well. Softmax normalization is most often applied to the output layer of a classification network to get the probabilities of all answers. Simplemax normalization is often applied to the hidden layer consisting of elements with radial basis activation function to get a "normalized RBF" activation.
A normalization method softmax ( pj = exp(yj) / Sumi(exp(yi) ) ) or simplemax ( pj = yj / Sumi(yi) ) can be applied to the computed activation values. The attribute normalizationMethod is defined for the network with default value none ( pj = yj ), but can be specified for each layer as well. Softmax normalization is most often applied to the output layer of a classification network to get the probabilities of all answers. Simplemax normalization is often applied to the hidden layer consisting of elements with radial basis activation function to get a "normalized RBF" activation.
Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output), from evidence given by one or more predictor (input) fields.
Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output), from evidence given by one or more predictor (input) fields.
Naïve Bayes models require the target field to be discretized so that a finite number of values are considered by the model.
k-Nearest Neighbors (k-NN) is an instance-based learning algorithm. In a k-NN model, a hypothesis or generalization is built from the training data directly at the time a query is made to the system. The prediction is based on the K training instances closest to the case being scored. Therefore, all training cases have to be stored, which may be problematic when the amount of data is large. This model has the ability to store the data directly in PMML using InlineTable or elsewhere using the TableLocator element defined in the Taxonomy document.
k-Nearest Neighbors (k-NN) is an instance-based learning algorithm. In a k-NN model, a hypothesis or generalization is built from the training data directly at the time a query is made to the system. The prediction is based on the K training instances closest to the case being scored. Therefore, all training cases have to be stored, which may be problematic when the amount of data is large. This model has the ability to store the data directly in PMML using InlineTable or elsewhere using the TableLocator element defined in the Taxonomy document.
A k-NN model can have one or more target variables or no targets. When one or more targets are present, the predicted value is computed based on the target values of the nearest neighbors. When no targets are present, the model specifies a case ID variable for the training data. In this way, one can easily obtain the IDs of the K closest training cases (nearest neighbors).
A k-NN model consists of four major parts:
- Model attributes
- Training instances
- Comparison measure
- Input fields
Defines how input fields are normalized so that the values can be processed in the neural network. For example, string values must be encoded as numeric values.
Defines how input fields are normalized so that the values can be processed in the neural network. For example, string values must be encoded as numeric values.
An input neuron represents the normalized value for an input field. A numeric input field is usually mapped to a single input neuron while a categorical input field is usually mapped to a set of input neurons using some fan-out function. The normalization is defined using the elements NormContinuous and NormDiscrete defined in the Transformation Dictionary. The element DerivedField is the general container for these transformations.
An input neuron represents the normalized value for an input field. A numeric input field is usually mapped to a single input neuron while a categorical input field is usually mapped to a set of input neurons using some fan-out function. The normalization is defined using the elements NormContinuous and NormDiscrete defined in the Transformation Dictionary. The element DerivedField is the general container for these transformations.
A neural network has one or more input nodes and one or more neurons. Some neurons' outputs are the output of the network. The network is defined by the neurons and their connections, aka weights. All neurons are organized into layers; the sequence of layers defines the order in which the activations are computed. All output activations for neurons in some layer L are evaluated before computation proceeds to the next layer L+1. Note that this allows for recurrent networks where outputs of neurons in layer L+i can be used as input in layer L where L+i > L. The model does not define a specific evaluation order for neurons within a layer.
A neural network has one or more input nodes and one or more neurons. Some neurons' outputs are the output of the network. The network is defined by the neurons and their connections, aka weights. All neurons are organized into layers; the sequence of layers defines the order in which the activations are computed. All output activations for neurons in some layer L are evaluated before computation proceeds to the next layer L+1. Note that this allows for recurrent networks where outputs of neurons in layer L+i can be used as input in layer L where L+i > L. The model does not define a specific evaluation order for neurons within a layer.
Defines how the output of the neural network must be interpreted.
Defines how the output of the neural network must be interpreted.
Contains an identifier id which must be unique in all layers. The attribute bias implicitly defines a connection to a bias unit where the unit's value is 1.0 and the weight is the value of bias. The activation function and normalization method for Neuron can be defined in NeuralLayer. If either one is not defined for the layer then the default one specified for NeuralNetwork applies. If the activation function is radialBasis, the attribute width must be specified either in Neuron, NeuralLayer or NeuralNetwork. Again, width specified in Neuron will override a respective value from NeuralLayer, and in turn will override a value given in NeuralNetwork.
Contains an identifier id which must be unique in all layers. The attribute bias implicitly defines a connection to a bias unit where the unit's value is 1.0 and the weight is the value of bias. The activation function and normalization method for Neuron can be defined in NeuralLayer. If either one is not defined for the layer then the default one specified for NeuralNetwork applies. If the activation function is radialBasis, the attribute width must be specified either in Neuron, NeuralLayer or NeuralNetwork. Again, width specified in Neuron will override a respective value from NeuralLayer, and in turn will override a value given in NeuralNetwork.
Weighted connections between neural net nodes are represented by Con elements.
Defines what to do in situations where scoring cannot reach a leaf node.
Defines what to do in situations where scoring cannot reach a leaf node.
This element is an encapsulation for either defining a split or a leaf in a tree model. Every Node contains a predicate that identifies a rule for choosing itself or any of its siblings. A predicate may be an expression composed of other nested predicates.
This element is an encapsulation for either defining a split or a leaf in a tree model. Every Node contains a predicate that identifies a rule for choosing itself or any of its siblings. A predicate may be an expression composed of other nested predicates.
Cell in the ParamMatrix. The optional targetCategory and required parameterName attributes determine the cell's location in the Parameter matrix. The information contained is: beta (actual Parameter value, required), and df (degrees of freedom, optional). For ordinalMultinomial model ParamMatrix specifies different values for the intercept parameter: one for each target category except one. Values for all other parameters are constant across all target variable values. For multinomialLogistic model ParamMatrix specifies parameter estimates for each target category except the reference category.
Cell in the ParamMatrix. The optional targetCategory and required parameterName attributes determine the cell's location in the Parameter matrix. The information contained is: beta (actual Parameter value, required), and df (degrees of freedom, optional). For ordinalMultinomial model ParamMatrix specifies different values for the intercept parameter: one for each target category except one. Values for all other parameters are constant across all target variable values. For multinomialLogistic model ParamMatrix specifies parameter estimates for each target category except the reference category.
Matrix of Parameter estimate covariances. Made up of PCovCells, each of them being located via row information for Parameter name (pRow), row information for target variable value (tRow), column information for Parameter name (pCol) and column information for target variable value (tCol). Note that the matrix is symmetric with respect to the main diagonal (interchanging tRow and tCol together with pRow and pCol will not change the value). Therefore it is sufficient that only half of the matrix be exported. Attributes tRow and tCol are optional since they are not needed for linear regression models. This element has an optional attribute type that can take values model and robust. This attribute describes the way the covariance matrix was computed in generalizedLinear model. The robust option is also known as Huber-White or sandwich or HCCM.
Matrix of Parameter estimate covariances. Made up of PCovCells, each of them being located via row information for Parameter name (pRow), row information for target variable value (tRow), column information for Parameter name (pCol) and column information for target variable value (tCol). Note that the matrix is symmetric with respect to the main diagonal (interchanging tRow and tCol together with pRow and pCol will not change the value). Therefore it is sufficient that only half of the matrix be exported. Attributes tRow and tCol are optional since they are not needed for linear regression models. This element has an optional attribute type that can take values model and robust. This attribute describes the way the covariance matrix was computed in generalizedLinear model. The robust option is also known as Huber-White or sandwich or HCCM.
Cell in the PPMatrix. Knows its row name, column name.
Cell in the PPMatrix. Knows its row name, column name.
Predictor-to-Parameter correlation matrix. It is a rectangular matrix having a column for each Predictor (factor or covariate) and a row for each Parameter. The matrix is represented as a sequence of cells, each cell containing a number representing the correlation between the Predictor and the Parameter.
Predictor-to-Parameter correlation matrix. It is a rectangular matrix having a column for each Predictor (factor or covariate) and a row for each Parameter. The matrix is represented as a sequence of cells, each cell containing a number representing the correlation between the Predictor and the Parameter.
PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.
PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.
Parameter matrix. A table containing the Parameter values along with associated statistics (degrees of freedom). One dimension has the target variable's categories, the other has the Parameter names. The table is represented by specifying each cell. There is no requirement for Parameter names other than that each name should uniquely identify one Parameter.
Parameter matrix. A table containing the Parameter values along with associated statistics (degrees of freedom). One dimension has the target variable's categories, the other has the Parameter names. The table is represented by specifying each cell. There is no requirement for Parameter names other than that each name should uniquely identify one Parameter.
Each Parameter contains a required name and optional label.
Each Parameter contains a required name and optional label.
- Value parameters:
- label
If present, is meant to give a hint on a Parameter's correlation with the Predictors.
- name
Should be unique within the model and as brief as possible (since Parameter names appear frequently in the document).
- referencePoint
The optional attribute referencePoint is used in Cox regression models only and has a default value of 0
Lists all Parameters. ParameterList can be empty only for CoxRegression models, for other models at least one Parameter should be present.
Lists all Parameters. ParameterList can be empty only for CoxRegression models, for other models at least one Parameter should be present.
Polynomial basis functions which lead to a polynome classifier. K(x,y) = (gamma*<x,y>+coef0)degree
Polynomial basis functions which lead to a polynome classifier. K(x,y) = (gamma*<x,y>+coef0)degree
Describes a categorical (factor) or a continuous (covariate) predictor for the model. When describing a factor, it can optionally contain a list of categories and a contrast matrix. Such matrix describes the codings of categorical variables. If a categorical variable has n values, there will be n rows and n-1 or n columns in the matrix. The rows and columns correspond to the categories of the factor in the order listed in the Category element if it is present, otherwise in the order listed in the DataField or DerivedField element. If the Categories element is present and the corresponding DataField or DerivedField element has a list of valid categories, then the list in Categories should be a subset of that in DataField or DerivedField. A contrast matrix with n-1 columns helps to reduce the total number of parameters in the model.
Describes a categorical (factor) or a continuous (covariate) predictor for the model. When describing a factor, it can optionally contain a list of categories and a contrast matrix. Such matrix describes the codings of categorical variables. If a categorical variable has n values, there will be n rows and n-1 or n columns in the matrix. The rows and columns correspond to the categories of the factor in the order listed in the Category element if it is present, otherwise in the order listed in the DataField or DerivedField element. If the Categories element is present and the corresponding DataField or DerivedField element has a list of valid categories, then the list in Categories should be a subset of that in DataField or DerivedField. A contrast matrix with n-1 columns helps to reduce the total number of parameters in the model.
Radial basis functions, the most common kernel type K(x,y) = exp(-gamma*||x - y||2)
Radial basis functions, the most common kernel type K(x,y) = exp(-gamma*||x - y||2)
Describes how reason codes shall be ranked.
Describes how reason codes shall be ranked.
The regression functions are used to determine the relationship between the dependent variable (target field) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on. While the term regression usually refers to the prediction of numeric values, the PMML element RegressionModel can also be used for classification. This is due to the fact that multiple regression equations can be combined in order to predict categorical values.
The regression functions are used to determine the relationship between the dependent variable (target field) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on. While the term regression usually refers to the prediction of numeric values, the PMML element RegressionModel can also be used for classification. This is due to the fact that multiple regression equations can be combined in order to predict categorical values.
Specifies the type of a regression model. The attribute modelType is for information only.
Specifies the type of a regression model. The attribute modelType is for information only.
Describes how the prediction is converted into a confidence value (aka probability).
Describes how the prediction is converted into a confidence value (aka probability).
Describes how rules are selected to apply the model to a new case
Describes how rules are selected to apply the model to a new case
- Value parameters:
- criterion
explains how to determine and rank predictions and their associated confidences from the ruleset in case multiple rules fire.
- Value parameters:
- defaultConfidence
provides a confidence to be returned with the default score (when scoring a case and no rules in the ruleset fire).
- defaultScore
The value of score in a RuleSet serves as the default predicted value when scoring a case no rules in the ruleset fire.
- nbCorrect
indicates the number of training/test instances for which the default score is correct.
- recordCount
The number of training/test cases to which the ruleset was applied to generate support and confidence measures for individual rules.
- ruleSelectionMethods
specifies how to select rules from the ruleset to score a new case. If more than one method is included, the first method is used as the default method for scoring, but the other methods included may be selected by the application wishing to perform scoring as valid alternative methods.
- rules
contains 0 or more rules which comprise the ruleset.
- scoreDistributions
describe the distribution of the predicted value in the test/training data.
Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.
Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.
The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one. Depending on the method used, the number of SVMs built will differ.
The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one. Depending on the method used, the number of SVMs built will differ.
The SVM classification method specifies which of both methods is used:
Usually the SVM model uses support vectors to define the model function. However, for the case of a linear function (linear kernel type) the function is a linear hyperplane that can be more efficiently expressed using the coefficients of all mining fields. In this case, no support vectors are required at all, and hence SupportVectors will be absent and only the Coefficients element is necessary.
Usually the SVM model uses support vectors to define the model function. However, for the case of a linear function (linear kernel type) the function is a linear hyperplane that can be more efficiently expressed using the coefficients of all mining fields. In this case, no support vectors are required at all, and hence SupportVectors will be absent and only the Coefficients element is necessary.
The SVM representation specifies which of both representations is used:
A data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value.
A data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value.
Scorecards are very popular in the financial industry for their interpretability and ease of implementation, and because input attributes can be mapped to a series of reason codes which provide explanations of each individual's score. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services. Reason codes, as the name suggests, allow for an explanation of scorecard behavior and any adverse decisions generated as a consequence of the overall score. They basically answer the question: "Why is the score low, given its input conditions?"
Holds attributes of a Scorecard.
Holds attributes of a Scorecard.
Sigmoid kernel functions for some models of Neural Network type K(x,y) = tanh(gamma*<x,y>+coef0)
Sigmoid kernel functions for some models of Neural Network type K(x,y) = tanh(gamma*<x,y>+coef0)
SimpleRule consists of an identifier, a predicate, a score and information on rule performance.
SimpleRule consists of an identifier, a predicate, a score and information on rule performance.
- Value parameters:
- confidence
Indicates the confidence of the rule.
- id
The value of id serves as a unique identifier for the rule. Must be unique within the ruleset.
- nbCorrect
Indicates the number of training/test instances on which the rule fired and the prediction was correct.
- predicate
the condition upon which the rule fires. For more details on PREDICATE see the section on predicates in TreeModel. This explains how predicates are described and evaluated and how missing values are handled.
- recordCount
The number of training/test instances on which the rule fired.
- score
The predicted value when the rule fires.
- scoreDistributions
Describes the distribution of the predicted value for instances where the rule fires in the training/test data.
- weight
Indicates the relative importance of the rule. May or may not be equal to the confidence.
Indicates whether non-leaf Nodes in the tree model have exactly two children, or an unrestricted number of children.
Indicates whether non-leaf Nodes in the tree model have exactly two children, or an unrestricted number of children.
SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary.
SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary.
Holds a single instance of an SVM.
Holds a single instance of an SVM.
SupportVectors holds the support vectors as references towards VectorDictionary used by the respective SVM instance. For storing the SVM coefficients, the element Coefficients is used. Both are combined in the element SupportVectorMachine, which holds a single instance of an SVM.
The attribute targetCategory is required for classification models and gives the corresponding class label. This attribute is to be used for classification models implementing the one-against-all method. In this method, for n classes, there are exactly n SupportVectorMachine elements. Depending on the model attribute maxWins, the SVM with the largest or the smallest value determines the predicted class label.
The attribute alternateTargetCategory is required in case of binary classification models with only one SupportVectorMachine element. It is also required in case of multi-class classification models implementing the one-against-one method. In this method, for n classes, there are exactly n(n-1)/2 SupportVectorMachine elements where each SVM is trained on data from two classes. The first class is represented by the targetCategory attribute and the second class by the alternateTargetCategory attribute. The predicted class label is determined based on a voting scheme in which the category with the maximum number of votes wins. In case of a tie, the predicted class label is the first category with maximal number of votes. For both cases (binary classification and multi-class classification with one-against-one), the corresponding class labels are determined by comparing the numeric prediction with the threshold. If maxWins is true and the prediction is larger than the threshold or maxWins is false and the prediction is smaller than the threshold, the class label is the targetCategory attribute, otherwise, it is the alternateTargetCategory attribute.
Note that each SupportVectorMachine element may have its own threshold that overrides the default.
Support Vector Machine models for classification and regression are considered. A Support Vector Machine is a function f which is defined in the space spanned by the kernel basis functions K(x,xi) of the support vectors xi: f(x) = Sum_(i=1)n αi*K(x,xi) + b.
Support Vector Machine models for classification and regression are considered. A Support Vector Machine is a function f which is defined in the space spanned by the kernel basis functions K(x,xi) of the support vectors xi: f(x) = Sum_(i=1)n αi*K(x,xi) + b.
Here n is the number of all support vectors, αi are the basis coefficients and b is the absolute coefficient. In an equivalent interpretation, n could also be considered as the total number of all training vectors xi. Then the support vectors are the subset of all those vectors xi whose coefficients αi are greater than zero. The term Support Vector (SV) has also a geometrical interpretation because these vectors really support the discrimination function f(x) = 0 in the mechanical interpretation.
Contains all support vectors required for the respective SVM instance.
Contains all support vectors required for the respective SVM instance.
Lists the counts associated with each value of the target field, However, a TargetValueCount whose count is zero may be omitted. Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value. Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.
Lists the counts associated with each value of the target field, However, a TargetValueCount whose count is zero may be omitted. Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value. Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.
Used for a continuous input field Ii to define statistical measures associated with each value of the target field. As defined in CONTINUOUS-DISTRIBUTION-TYPES, different distribution types can be used to represent such measures. For Bayes models, these are restricted to Gaussian and Poisson distributions.
Used for a continuous input field Ii to define statistical measures associated with each value of the target field. As defined in CONTINUOUS-DISTRIBUTION-TYPES, different distribution types can be used to represent such measures. For Bayes models, these are restricted to Gaussian and Poisson distributions.
Serves as the envelope for element TargetValueStat.
Serves as the envelope for element TargetValueStat.
Encapsulates the definition of the fields included in the training instances as well as their values.
Encapsulates the definition of the fields included in the training instances as well as their values.
- Value parameters:
- fieldCount
Defines the number of fields (features + targets). This number needs to match the number of InstanceField elements defined under InstanceFields.
- instanceFields
Defines all the fields included in the training instances.
- isTransformed
Used as a flag to determine whether or not the training instances have already been transformed. If isTransformed is "false", it indicates that the training data has not been transformed yet. If "true", it indicates that it has already been transformed.
- recordCount
Defines the number of training instances or records. This number needs to match the number of instances defined in the element InlineTable or in the external data if TableLocator is used.
- table
Representing the training data (feature vectors and class labels)
Holds attributes of a Tree model
Holds attributes of a Tree model
The TreeModel in PMML allows for defining either a classification or prediction structure. Each Node holds a logical predicate expression that defines the rule for choosing the Node or any of the branching Nodes.
The TreeModel in PMML allows for defining either a classification or prediction structure. Each Node holds a logical predicate expression that defines the rule for choosing the Node or any of the branching Nodes.
Contains the set of support vectors which are of the typeVectorInstance.
Contains the set of support vectors which are of the typeVectorInstance.
Defines which entries in the vectors correspond to which fields.
Defines which entries in the vectors correspond to which fields.
A data vector given in dense or sparse array format. The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.
A data vector given in dense or sparse array format. The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.