org.pmml4s.model

PMML is a standard for XML documents which express trained instances of analytic models. The following classes of model are addressed:

Type members

Classlikes

object ActivationFunction extends Enumeration
object AlgorithmType extends Enumeration

Defines model types used by the anomaly model.

Defines model types used by the anomaly model.

class AnomalyDetectionAttributes(val functionName: MiningFunction, val algorithmType: AlgorithmType, val sampleDataSize: Option[Long], val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasAnomalyDetectionAttributes

Holds attributes of an Anomaly Detection Model.

Holds attributes of an Anomaly Detection Model.

class AnomalyDetectionModel(var parent: Model, val attributes: AnomalyDetectionAttributes, val miningSchema: MiningSchema, val model: Model, val meanClusterDistances: Option[MeanClusterDistances], val output: Option[Output], val localTransformations: Option[LocalTransformations], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedAnomalyDetectionAttributes

Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set. Traditional approaches comprise of distance and density-based approaches. Examples of common ways to define distance or density are distance to the k-nearest neighbors or count of points within a given fixed radius. These methods however are unable to handle data sets with regions of different densities and do not scale well for large data. Other algorithms have been proposed which are better able to handle such cases; the PMML standard at this time supports three such algorithms:

Anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a data set. Traditional approaches comprise of distance and density-based approaches. Examples of common ways to define distance or density are distance to the k-nearest neighbors or count of points within a given fixed radius. These methods however are unable to handle data sets with regions of different densities and do not scale well for large data. Other algorithms have been proposed which are better able to handle such cases; the PMML standard at this time supports three such algorithms:

  • Isolation Forest
  • One Class SVM
  • Clustering mean distance based anomaly detection model
  • Other models can also be used if their scoring follows PMML standard rules.
class AssociationAttributes(val numberOfTransactions: Int, val minimumSupport: Double, val minimumConfidence: Double, val numberOfItems: Int, val numberOfItemsets: Int, val numberOfRules: Int, val maxNumberOfItemsPerTA: Option[Int], val avgNumberOfItemsPerTA: Option[Double], val lengthLimit: Option[Int], val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasAssociationAttributes
class AssociationModel(var parent: Model, val attributes: AssociationAttributes, val miningSchema: MiningSchema, val items: Array[Item], val itemsets: Array[Itemset], val associationRules: Array[AssociationRule], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedAssociationAttributes

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product or set of products is often bought in combination with a certain set of other products, also known as Market Basket Analysis. An Association Rule model typically has two variables: one for grouping records together into transactions (usageType="group") and another that uniquely identifies each record (usageType="active"). Alternatively, association rule models can be built on regular data, where each category of each categorical field is an item. Yet another possible format of data is a table with true/false values, where only the fields having true value in a record are considered valid items.

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product or set of products is often bought in combination with a certain set of other products, also known as Market Basket Analysis. An Association Rule model typically has two variables: one for grouping records together into transactions (usageType="group") and another that uniquely identifies each record (usageType="active"). Alternatively, association rule models can be built on regular data, where each category of each categorical field is an item. Yet another possible format of data is a table with true/false values, where only the fields having true value in a record are considered valid items.

An Association Rule model consists of four major parts:

  • Model attributes
  • Items
  • ItemSets
  • AssociationRules
class AssociationRule(val antecedent: String, val consequent: String, val support: Double, val confidence: Double, val lift: Option[Double], val leverage: Option[Double], val affinity: Option[Double], val id: Option[String]) extends HasPredictedValue with HasEntityId with HasConfidence with PmmlElement

We consider association rules of the form " => " next:

We consider association rules of the form " => " next:

Value parameters:
affinity

Also known as Jaccard Similarity, affinity is a measure of the transactions that contain both the antecedent and consequent (intersect) compared to those that contain the antecedent or the consequent (union): affinity(A->C) = support(A+C) / [ support(A) + support(C) - support(A+C)]

antecedent

The id value of the itemset which is the antecedent of the rule. We represent the itemset by the letter A.

confidence

The confidence of the rule: confidence(A->C) = support(A+C) / support(A)

consequent

The id value of the itemset which is the consequent of the rule. We represent the itemset by the letter C.

id

An identification to uniquely identify an association rule.

leverage

Another measure of interestingness is leverage. An association with higher frequency and lower lift may be more interesting than an alternative rule with lower frequency and higher lift. The former can be more important in practice because it applies to more cases. The value is the difference between the observed frequency of A+C and the frequency that would be expected if A and C were independent: leverage(A->C) = support(A->C) - support(A)*support(C)

lift

A very popular measure of interestingness of a rule is lift. Lift values greater than 1.0 indicate that transactions containing A tend to contain C more often than transactions that do not contain A: lift(A->C) = confidence(A->C) / support(C)

support

The support of the rule, that is, the relative frequency of transactions that contain A and C: support(A->C) = support(A+C)

class Attribute(val reasonCode: Option[String], val partialScore: Option[Double], val predicate: Predicate, val complexPartialScore: Option[ComplexPartialScore]) extends Predicate with PmmlElement

Defines input attributes for each scorecard characteristic are defined in terms of predicates. For numeric characteristics, predicates are used to implement the mapping from a range of continuous values to a partial score . For example, age range 20 to 29 may map to partial score "15". For categorical characteristics, predicates are used to implement the mapping of categorical values to partial scores. Note that while predicates will not (typically) overlap, the Scoring Procedure requires the ordering of Attributes to be respected, and that the first matching Attribute shall determine the partial scored value.

Defines input attributes for each scorecard characteristic are defined in terms of predicates. For numeric characteristics, predicates are used to implement the mapping from a range of continuous values to a partial score . For example, age range 20 to 29 may map to partial score "15". For categorical characteristics, predicates are used to implement the mapping of categorical values to partial scores. Note that while predicates will not (typically) overlap, the Scoring Procedure requires the ordering of Attributes to be respected, and that the first matching Attribute shall determine the partial scored value.

Value parameters:
complexPartialScore

Used to implement complex point allocation of the score points awarded to the Attribute . To be used in lieu of attribute partialScore. If both are defined, element ComplexPartialScore takes precedence over attribute partialScore for computing the score points awarded to the Attribute. Whenever element ComplexPartialScore is used, the actual partial score is the value returned by the EXPRESSION (see Transformations for more information).

partialScore

Defines the score points awarded to the Attribute. Note that attribute partialScore is optional. A partial score is required though to be specified for every Attribute. Either it needs to be defined through the partialScore attribute or through the ComplexPartialScore element as defined below.

predicate

The condition upon which the mapping between input attribute and partial score takes place. For more details on PREDICATE see the section on predicates in TreeModel for an explanation on how predicates are described and evaluated. In scorecard models, all the predicates defining the Attributes for a particular Characteristic must all reference a single field.

reasonCode

Defines the attribute's reason code. If the reasonCode attribute is used in this level, it takes precedence over the reasonCode attribute associated with the Characteristic element.

class BaseCumHazardTables(val baselineStratums: Array[BaselineStratum], val baselineCells: Array[BaselineCell], val maxTime: Option[Double]) extends PmmlElement
class BaselineCell(val time: Double, val cumHazard: Double) extends PmmlElement
object BaselineMethod extends Enumeration

An informational string describing the technique used by the model designer to establish the baseline scores. Allowed values are:

An informational string describing the technique used by the model designer to establish the baseline scores. Allowed values are:

  • max: Indicates that baseline scores are the maximum partial score in element Characteristic
  • min: Baseline scores are the minimum partial score in Characteristic
  • mean: Baseline scores are the mean (weighted average) partial score in Characteristic
  • neutral: Baseline scores are the risk-neutral partial score in Characteristic
  • other: Baseline scores are derived using any other technique.

This attribute is purely informational and does not influence the runtime calculations of reason codes. (By contrast, the reasonCodeAlgorithm is critical to achieving an accurate calculation of reasons.)

class BaselineStratum(val cells: Array[BaselineCell], val value: Any, val maxTime: Double, val label: Option[String]) extends PmmlElement
class BayesInput(val fieldName: Field, val targetValueStats: Option[TargetValueStats], val pairCounts: Array[PairCounts], val derivedField: Option[DerivedField]) extends PmmlElement

For a discrete field, each BayesInput contains the counts pairing the discrete values of that field with those of the target field. For a continuous field, the BayesInput element lists the distributions obtained for that field with each value of the target field. BayesInput may also be used to define how continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here).

For a discrete field, each BayesInput contains the counts pairing the discrete values of that field with those of the target field. For a continuous field, the BayesInput element lists the distributions obtained for that field with each value of the target field. BayesInput may also be used to define how continuous values are encoded as discrete bins. (Discretization is achieved using DerivedField; only the Discretize mapping for DerivedField may be invoked here).

Note that a BayesInput element encompasses either one TargetValueStats element or one or more PairCounts elements. Element DerivedField can only be used in conjunction with PairCounts.

class BayesInputs(val inputs: Array[BayesInput]) extends PmmlElement

Contains several BayesInput elements.

Contains several BayesInput elements.

class BayesOutput(val fieldName: Field, val targetValueCounts: TargetValueCounts) extends PmmlElement

Contains the counts associated with the values of the target field.

Contains the counts associated with the values of the target field.

object CatScoringMethod extends Enumeration
class Categories(val categories: Array[Category]) extends PmmlElement
class Category(val value: Any) extends PmmlElement
class Characteristic(val name: Option[String], val reasonCode: Option[String], val baselineScore: Option[Double], val attributes: Array[Attribute]) extends PmmlElement

Defines the point allocation strategy for each scorecard characteristic (numeric or categorical). Once point allocation between input attributes and partial scores takes place, each scorecard characteristic is assigned a single partial score which is used to compute the overall score. The overall score is simply the sum of all partial scores. Partial scores are assumed to be continuous values of type "double".

Defines the point allocation strategy for each scorecard characteristic (numeric or categorical). Once point allocation between input attributes and partial scores takes place, each scorecard characteristic is assigned a single partial score which is used to compute the overall score. The overall score is simply the sum of all partial scores. Partial scores are assumed to be continuous values of type "double".

Value parameters:
attributes

Input attributes for each scorecard characteristic are defined in terms of predicates.

baselineScore

Sets the characteristic's baseline score against which to compare the actual partial score when determining the ranking of reason codes. This attribute is required when useReasonCodes attribute is "true" and attribute baselineScore is not defined in element Scorecard. Whenever baselineScore is defined for a Characteristic, it takes precedence over the baselineScore attribute value defined in element Scorecard. Note that the design-time technique used to determine the baseline scores is captured in the baselineMethod attribute.

name

Name of the characteristic. For informational reasons only.

reasonCode

Contains the characteristic's reason code, which will be later mapped to a business reason usually associated with an adverse decision.

class Characteristics(val characteristics: Array[Characteristic]) extends PmmlElement

Envelopes for all scorecard characteristics.

Envelopes for all scorecard characteristics.

class Cluster(val id: Option[String], val name: Option[String], val size: Option[Int], val kohonenMap: Option[KohonenMap], val array: Option[Array[Double]], val partition: Option[Partition], val covariances: Option[Covariances]) extends PmmlElement

A cluster is defined by its center vector or by statistics. A center vector is implemented by a NUM-ARRAY. Each Partition corresponds to a cluster and holds field statistics to describe it. The definition of a cluster may contain a center vector as well as statistics. The attribute modelClass in the ClusteringModel defines which one is used to actually define the cluster.

A cluster is defined by its center vector or by statistics. A center vector is implemented by a NUM-ARRAY. Each Partition corresponds to a cluster and holds field statistics to describe it. The definition of a cluster may contain a center vector as well as statistics. The attribute modelClass in the ClusteringModel defines which one is used to actually define the cluster.

class ClusteringAttributes(val modelClass: ModelClass, val numberOfClusters: Int, val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasClusteringAttributes
class ClusteringField(val field: Field, val comparisons: Option[Comparisons], val isCenterField: Boolean, val fieldWeight: Double, val similarityScale: Option[Double], val compareFunction: Option[CompareFunction]) extends PmmlElement
Value parameters:
compareFunction

A function of taking two field values and a similarityScale to define similarity/distance. It can override the general specification of compareFunction in ComparisonMeasure.

comparisons

A matrix which contains the similarity values or distance values.

field

Refers (by name) to a MiningField or to a DerivedField.

fieldWeight

The importance factor for the field. This field weight is used in the comparison functions in order to compute the comparison measure. The value must be a number greater than 0. The default value is 1.0.

isCenterField

Indicates whether the respective field is a center field, i.e. a component of the center, in a center-based model. Only center fields correspond to the entries in the center vectors in order.

similarityScale

The distance such that similarity becomes 0.5.

class ClusteringModel(var parent: Model, val attributes: ClusteringAttributes, val miningSchema: MiningSchema, val comparisonMeasure: ComparisonMeasure, val clusteringFields: Array[ClusteringField], val missingValueWeights: Option[MissingValueWeights], val clusters: Array[Cluster], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedClusteringAttributes

A cluster model basically consists of a set of clusters. For each cluster a center vector can be given. In center-based models a cluster is defined by a vector of center coordinates. Some distance measure is used to determine the nearest center, that is the nearest cluster for a given input record. For distribution-based models (e.g., in demographic clustering) the clusters are defined by their statistics. Some similarity measure is used to determine the best matching cluster for a given record. The center vectors then only approximate the clusters.

A cluster model basically consists of a set of clusters. For each cluster a center vector can be given. In center-based models a cluster is defined by a vector of center coordinates. Some distance measure is used to determine the nearest center, that is the nearest cluster for a given input record. For distribution-based models (e.g., in demographic clustering) the clusters are defined by their statistics. Some similarity measure is used to determine the best matching cluster for a given record. The center vectors then only approximate the clusters.

class Coefficient(val value: Double) extends PmmlElement

Coefficient αi is described

Coefficient αi is described

class Coefficients(val coefficients: Array[Coefficient], val absoluteValue: Double) extends PmmlElement

Used to store the support vector coefficients αi and b.

Used to store the support vector coefficients αi and b.

Value parameters:
absoluteValue

Contains the value of the absolute coefficient b.

class Comparisons(val matrix: Matrix) extends PmmlElement

Comparisons is a matrix which contains the similarity values or distance values, depending on the attribute modelClass in ClusteringModel. The order of the rows and columns corresponds to the order of discrete values or intervals in that field.

Comparisons is a matrix which contains the similarity values or distance values, depending on the attribute modelClass in ClusteringModel. The order of the rows and columns corresponds to the order of discrete values or intervals in that field.

Defines ComplexPartialScore, the actual partial score is the value returned by the EXPRESSION (see org.pmml4s .transformations for more information).

Defines ComplexPartialScore, the actual partial score is the value returned by the EXPRESSION (see org.pmml4s .transformations for more information).

class CompoundRule(val predicate: Predicate, val rules: Array[Rule]) extends Rule with PmmlElement

CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism.

CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism.

Value parameters:
predicate

the condition upon which the rule fires.

rules

One or more rules that are contained within the CompoundRule. Each of these rules may be a SimpleRule or a CompoundRule.

class Con(val from: String, val weight: Double) extends PmmlElement

Defines the connections coming into that parent element. The neuron identified by from may be part of any layer.

Defines the connections coming into that parent element. The neuron identified by from may be part of any layer.

object ContScoringMethod extends Enumeration
class Covariances(val matrix: Matrix) extends PmmlElement

Stores coordinate-by-coordinate variances (diagonal cells) and covariances (non-diagonal cells).

Stores coordinate-by-coordinate variances (diagonal cells) and covariances (non-diagonal cells).

class CovariateList(val predictors: Array[Predictor]) extends PmmlElement

List of covariate names. Will not be present when there is no covariate. Each name in the list must match a DataField name or a DerivedField name. The covariates will be treated as continuous variables.

List of covariate names. Will not be present when there is no covariate. Each name in the list must match a DataField name or a DerivedField name. The covariates will be treated as continuous variables.

object Criterion extends Enumeration
object CumulativeLinkFunction extends Enumeration

Definition is used for specifying a cumulative link function used in ordinalMultinomial model.

Definition is used for specifying a cumulative link function used in ordinalMultinomial model.

class DataModel(val version: String, val header: Header, val dataDictionary: DataDictionary, val transformationDictionary: Option[TransformationDictionary]) extends Model

DataModel is a container for all info about metadata, it's the parent model of all predictive models.

DataModel is a container for all info about metadata, it's the parent model of all predictive models.

class DecisionTree(var parent: Model) extends EmbeddedModel
object Distribution extends Enumeration

The probability distribution of the dependent variable for generalizedLinear model.

The probability distribution of the dependent variable for generalizedLinear model.

abstract class EmbeddedModel extends Model

Model Composition

Model Composition

class EventValues(val values: Array[Value], val intervals: Array[Interval]) extends PmmlElement
class FactorList(val predictors: Array[Predictor]) extends PmmlElement

List of factor (categorical predictor) names. Not present if this particular regression flavor does not support factors (ex. linear regression). If present, the list may or may not be empty. Each name in the list must match a DataField name or a DerivedField name. The factors must be categorical variables.

List of factor (categorical predictor) names. Not present if this particular regression flavor does not support factors (ex. linear regression). If present, the list may or may not be empty. Each name in the list must match a DataField name or a DerivedField name. The factors must be categorical variables.

object GeneralModelType extends Enumeration

Specifies the type of regression model in use. This information will be used to select the appropriate mathematical formulas during scoring.

Specifies the type of regression model in use. This information will be used to select the appropriate mathematical formulas during scoring.

class GeneralRegressionAttributes(val functionName: MiningFunction, val modelType: GeneralModelType, val targetVariableName: Option[String], val targetReferenceCategory: Option[String], val cumulativeLink: Option[CumulativeLinkFunction], val linkFunction: Option[LinkFunction], val linkParameter: Option[Double], val trialsVariable: Option[Field], val trialsValue: Option[Int], val distribution: Option[Distribution], val distParameter: Option[Double], val offsetVariable: Option[Field], val offsetValue: Option[Double], val modelDF: Option[Double], val endTimeVariable: Option[Field], val startTimeVariable: Option[Field], val subjectIDVariable: Option[Field], val statusVariable: Option[Field], val baselineStrataVariable: Option[Field], val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasGeneralRegressionAttributes
class GeneralRegressionModel(var parent: Model, val attributes: GeneralRegressionAttributes, val miningSchema: MiningSchema, val parameterList: ParameterList, val factorList: Option[FactorList], val covariateList: Option[CovariateList], val ppMatrix: PPMatrix, val pCovMatrix: Option[PCovMatrix], val paramMatrix: ParamMatrix, val eventValues: Option[EventValues], val baseCumHazardTables: Option[BaseCumHazardTables], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedGeneralRegressionAttributes

Definition of a general regression model. As the name says it, this is intended to support a multitude of regression models.

Definition of a general regression model. As the name says it, this is intended to support a multitude of regression models.

class InstanceField(val field: String, val column: Option[String]) extends PmmlElement
Value parameters:
column

Defines the name of the tag or column used by element InlineTable. This attribute is required if element InlineTable is used to represent training data.

field

Contains the name of a DataField or a DerivedField (in case isTransformed is set to "true"). Can also contain the name of the case ID variable.

class InstanceFields(val instanceFields: Array[InstanceField]) extends PmmlElement

Serves as an envelope for all the fields included in the training instances. It encapsulates InstanceField elements.

Serves as an envelope for all the fields included in the training instances. It encapsulates InstanceField elements.

class Item(val id: String, val value: Any, @Since("4.3") val field: Option[Field], @Since("4.3") val category: Option[String], val mappedValue: Option[String], val weight: Option[Double]) extends PmmlElement

Obviously the id of an Item must be unique. Furthermore the Item values must be unique, or if they are not unique then attributes field and category must distiguish them. That is, an AssocationModel must not have different instances of Item where the values of the value, field, and category attribute are all the same. The entries in mappedValue may be the same, though. Here are some examples of Items:

Obviously the id of an Item must be unique. Furthermore the Item values must be unique, or if they are not unique then attributes field and category must distiguish them. That is, an AssocationModel must not have different instances of Item where the values of the value, field, and category attribute are all the same. The entries in mappedValue may be the same, though. Here are some examples of Items:

Value parameters:
id

An identification to uniquely identify an item.

mappedValue

Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.

value

The value of the item as in the input data.

weight

The weight of the item. For example, the price or value of an item.

class ItemRef(val itemRef: String) extends PmmlElement

Item references point to elements of type Item

Item references point to elements of type Item

Value parameters:
itemRef

Contains the identification of an item.

class Itemset(val itemRefs: Set[ItemRef], val id: String, val support: Option[Double], val numberOfItems: Option[Int]) extends PmmlElement
Value parameters:
id

An identification to uniquely identify an Itemset.

itemRefs

Item references point to elements of type Item

numberOfItems

The number of Items contained in this Itemset

support

The relative support of the Itemset: support(set) = (number of transactions containing the set) / (total number of transactions)

class KNNInput(val field: Field, val compareFunction: Option[CompareFunction], val fieldWeight: Double) extends PmmlElement
Value parameters:
field

Contains the name of a DataField or a DerivedField. If a DerivedField is used and isTransformed is false, the training instances will also need to be transformed together with the k-NN input.

fieldWeight

Defines the importance factor for the field. It is used in the comparison functions to compute the comparison measure. The value must be a number greater than 0. The default value is 1.0.

class KNNInputs(val knnInputs: Array[KNNInput]) extends PmmlElement

encapsulates several KNNInput elements which define the fields used to query the k-NN model, one KNNInput element per field.

encapsulates several KNNInput elements which define the fields used to query the k-NN model, one KNNInput element per field.

Companion:
object
object KernelType
Companion:
class
class KohonenMap(val coord1: Option[Double], val coord2: Option[Double], val coord3: Option[Double]) extends PmmlElement

The element KohonenMap is appropriate for clustering models that were produced by a Kohonen map algorithm. The attributes coord1, coord2 and coord3 describe the position of the current cluster in a map with up to three dimensions. This element is not relevant to the scoring function.

The element KohonenMap is appropriate for clustering models that were produced by a Kohonen map algorithm. The attributes coord1, coord2 and coord3 describe the position of the current cluster in a map with up to three dimensions. This element is not relevant to the scoring function.

class LinearKernelType(val description: Option[String]) extends KernelType with PmmlElement

Linear basis functions which lead to a hyperplane as classifier. K(x,y) = <x,y>

Linear basis functions which lead to a hyperplane as classifier. K(x,y) = <x,y>

object LinkFunction extends Enumeration

Definition is used for specifies the type of link function to use when generalizedLinear model type is specified.

Definition is used for specifies the type of link function to use when generalizedLinear model type is specified.

class MeanClusterDistances(val array: Array[Double]) extends PmmlElement

Contains an array of non-negative real values, it is required when the algorithm type is clusterMeanDist. The length of the array must equal the number of clusters in the model, and the values in it are the mean distances/similarities to the center for each cluster.

Contains an array of non-negative real values, it is required when the algorithm type is clusterMeanDist. The length of the array must equal the number of clusters in the model, and the values in it are the mean distances/similarities to the center for each cluster.

class MiningModel(var parent: Model, val attributes: ModelAttributes, val miningSchema: MiningSchema, val embeddedModels: Array[EmbeddedModel], val segmentation: Option[Segmentation], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedModelAttributes

The element MiningModel allows precise specification of the usage of multiple models within one PMML file. The two main approaches are Model Composition, and Segmentation.

The element MiningModel allows precise specification of the usage of multiple models within one PMML file. The two main approaches are Model Composition, and Segmentation.

Model Composition includes model sequencing and model selection but is only applicable to Tree and Regression models. Segmentation allows representation of different models for different data segments and also can be used for model ensembles and model sequences. Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods. Scoring a case using a sequence, or chain, of models allows the output of one model to be passed in as input to subsequent models.

ModelComposition uses "embedded model elements" that are defeatured copies of "standalone model elements" -- specifically, Regression for RegressionModel, DecisionTree for TreeModel. Besides being limited to Regression and Tree models, these embedded model elements lack key features like a MiningSchema (essential to manage scope across multiple model elements). Therefore, in PMML 4.2, the Model Composition approach has been deprecated since the Segmentation approach allows for a wider range of models to be used more reliably. For more on deprecation, see Conformance.

Segmentation is accomplished by using any PMML model element inside of a Segment element, which also contains a PREDICATE and an optional weight. MiningModel then contains Segmentation element with a number of Segment elements as well as the attribute multipleModelMethod specifying how all the models applicable to a record should be combined. It is also possible to use a combination of model composition and segmentation approaches, using simple regression or decision trees for data preprocessing before segmentation.

object MissingPredictionTreatment extends Enumeration

The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result. The attribute missingThreshold is closely related and has default value 1. The options are defined as follows:

The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result. The attribute missingThreshold is closely related and has default value 1. The options are defined as follows:

  • returnMissing means that if at least one model has a missing result, the whole MiningModel's result should be missing.
  • skipSegment says that if a model has a missing result, that segment is ignored and the results are computed based on other segments. However, if the fraction of the models with missing results ( weighted if the model combination method is weighted ) exceeds the missingThreshold, the returned result must be missing. This option should not be used with modelChain combination method.
  • continue says that if a model has a missing result, the processing should continue normally. This can work well for voting or modelChain situations, as well as returnFirst and returnAll. In case of majorityVote or weightedMajorityVote the missing result can be returned if it gets the most ( possibly weighted ) votes, or if the fraction of the models with missing result exceeds the missingThreshold. Otherwise a valid result is computed normally. Other model combination methods will return a missing value as the result.
object MissingValueStrategy extends Enumeration

Defines a strategy for dealing with missing values.

Defines a strategy for dealing with missing values.

class MissingValueWeights(val array: Array[Double]) extends PmmlElement

MissingValueWeights is used to adjust distance or similarity measures for missing data.

MissingValueWeights is used to adjust distance or similarity measures for missing data.

Abstract class that represents a PMML model

Abstract class that represents a PMML model

Companion:
object
object Model
Companion:
class
object ModelClass extends Enumeration
sealed trait ModelElement
Companion:
object
Companion:
class
object MultipleModelMethod extends Enumeration

Specifying how all the models applicable to a record should be combined.

Specifying how all the models applicable to a record should be combined.

class MutableModel extends Model
object NNNormalizationMethod extends Enumeration

A normalization method softmax ( pj = exp(yj) / Sumi(exp(yi) ) ) or simplemax ( pj = yj / Sumi(yi) ) can be applied to the computed activation values. The attribute normalizationMethod is defined for the network with default value none ( pj = yj ), but can be specified for each layer as well. Softmax normalization is most often applied to the output layer of a classification network to get the probabilities of all answers. Simplemax normalization is often applied to the hidden layer consisting of elements with radial basis activation function to get a "normalized RBF" activation.

A normalization method softmax ( pj = exp(yj) / Sumi(exp(yi) ) ) or simplemax ( pj = yj / Sumi(yi) ) can be applied to the computed activation values. The attribute normalizationMethod is defined for the network with default value none ( pj = yj ), but can be specified for each layer as well. Softmax normalization is most often applied to the output layer of a classification network to get the probabilities of all answers. Simplemax normalization is often applied to the hidden layer consisting of elements with radial basis activation function to get a "normalized RBF" activation.

class NaiveBayesAttributes(val threshold: Double, val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasNaiveBayesAttributes
class NaiveBayesModel(var parent: Model, val attributes: NaiveBayesAttributes, val miningSchema: MiningSchema, val bayesInputs: BayesInputs, val bayesOutput: BayesOutput, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedNaiveBayesAttributes

Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output), from evidence given by one or more predictor (input) fields.

Naïve Bayes uses Bayes' Theorem, combined with a ("naive") presumption of conditional independence, to predict the value of a target (output), from evidence given by one or more predictor (input) fields.

Naïve Bayes models require the target field to be discretized so that a finite number of values are considered by the model.

class NearestNeighborAttributes(val functionName: MiningFunction, val numberOfNeighbors: Int, val continuousScoringMethod: ContScoringMethod, val categoricalScoringMethod: CatScoringMethod, val instanceIdVariable: Option[String], val threshold: Double, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasNearestNeighborAttributes
class NearestNeighborModel(var parent: Model, val attributes: NearestNeighborAttributes, val miningSchema: MiningSchema, val trainingInstances: TrainingInstances, val comparisonMeasure: ComparisonMeasure, val knnInputs: KNNInputs, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedNearestNeighborAttributes

k-Nearest Neighbors (k-NN) is an instance-based learning algorithm. In a k-NN model, a hypothesis or generalization is built from the training data directly at the time a query is made to the system. The prediction is based on the K training instances closest to the case being scored. Therefore, all training cases have to be stored, which may be problematic when the amount of data is large. This model has the ability to store the data directly in PMML using InlineTable or elsewhere using the TableLocator element defined in the Taxonomy document.

k-Nearest Neighbors (k-NN) is an instance-based learning algorithm. In a k-NN model, a hypothesis or generalization is built from the training data directly at the time a query is made to the system. The prediction is based on the K training instances closest to the case being scored. Therefore, all training cases have to be stored, which may be problematic when the amount of data is large. This model has the ability to store the data directly in PMML using InlineTable or elsewhere using the TableLocator element defined in the Taxonomy document.

A k-NN model can have one or more target variables or no targets. When one or more targets are present, the predicted value is computed based on the target values of the nearest neighbors. When no targets are present, the model specifies a case ID variable for the training data. In this way, one can easily obtain the IDs of the K closest training cases (nearest neighbors).

A k-NN model consists of four major parts:

  • Model attributes
  • Training instances
  • Comparison measure
  • Input fields
class NeuralInput(val id: String, val derivedField: DerivedField) extends PmmlElement

Defines how input fields are normalized so that the values can be processed in the neural network. For example, string values must be encoded as numeric values.

Defines how input fields are normalized so that the values can be processed in the neural network. For example, string values must be encoded as numeric values.

class NeuralInputs(val neuralInputs: Array[NeuralInput], val numberOfInputs: Option[Int]) extends PmmlElement

An input neuron represents the normalized value for an input field. A numeric input field is usually mapped to a single input neuron while a categorical input field is usually mapped to a set of input neurons using some fan-out function. The normalization is defined using the elements NormContinuous and NormDiscrete defined in the Transformation Dictionary. The element DerivedField is the general container for these transformations.

An input neuron represents the normalized value for an input field. A numeric input field is usually mapped to a single input neuron while a categorical input field is usually mapped to a set of input neurons using some fan-out function. The normalization is defined using the elements NormContinuous and NormDiscrete defined in the Transformation Dictionary. The element DerivedField is the general container for these transformations.

class NeuralLayer(val neurons: Array[Neuron], val numberOfNeurons: Option[Int], val activationFunction: Option[ActivationFunction], val threshold: Option[Double], val width: Option[Double], val altitude: Option[Double], val normalizationMethod: Option[NNNormalizationMethod]) extends PmmlElement
class NeuralNetwork(var parent: Model, val attributes: NeuralNetworkAttributes, val miningSchema: MiningSchema, val neuralInputs: NeuralInputs, val neuralLayers: Array[NeuralLayer], val neuralOutputs: NeuralOutputs, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedNeuralNetworkAttributes

A neural network has one or more input nodes and one or more neurons. Some neurons' outputs are the output of the network. The network is defined by the neurons and their connections, aka weights. All neurons are organized into layers; the sequence of layers defines the order in which the activations are computed. All output activations for neurons in some layer L are evaluated before computation proceeds to the next layer L+1. Note that this allows for recurrent networks where outputs of neurons in layer L+i can be used as input in layer L where L+i > L. The model does not define a specific evaluation order for neurons within a layer.

A neural network has one or more input nodes and one or more neurons. Some neurons' outputs are the output of the network. The network is defined by the neurons and their connections, aka weights. All neurons are organized into layers; the sequence of layers defines the order in which the activations are computed. All output activations for neurons in some layer L are evaluated before computation proceeds to the next layer L+1. Note that this allows for recurrent networks where outputs of neurons in layer L+i can be used as input in layer L where L+i > L. The model does not define a specific evaluation order for neurons within a layer.

class NeuralNetworkAttributes(val functionName: MiningFunction, val activationFunction: ActivationFunction, val normalizationMethod: NNNormalizationMethod, val threshold: Double, val width: Option[Double], val altitude: Double, val numberOfLayers: Option[Int], val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasNeuralNetworkAttributes
class NeuralOutput(val outputNeuron: String, val derivedField: DerivedField) extends PmmlElement

Defines how the output of the neural network must be interpreted.

Defines how the output of the neural network must be interpreted.

class NeuralOutputs(val neuralOutputs: Array[NeuralOutput], val numberOfOutputs: Option[Int]) extends PmmlElement
class Neuron(val cons: Array[Con], val id: String, val bias: Option[Double], val width: Option[Double], val altitude: Option[Double]) extends PmmlElement

Contains an identifier id which must be unique in all layers. The attribute bias implicitly defines a connection to a bias unit where the unit's value is 1.0 and the weight is the value of bias. The activation function and normalization method for Neuron can be defined in NeuralLayer. If either one is not defined for the layer then the default one specified for NeuralNetwork applies. If the activation function is radialBasis, the attribute width must be specified either in Neuron, NeuralLayer or NeuralNetwork. Again, width specified in Neuron will override a respective value from NeuralLayer, and in turn will override a value given in NeuralNetwork.

Contains an identifier id which must be unique in all layers. The attribute bias implicitly defines a connection to a bias unit where the unit's value is 1.0 and the weight is the value of bias. The activation function and normalization method for Neuron can be defined in NeuralLayer. If either one is not defined for the layer then the default one specified for NeuralNetwork applies. If the activation function is radialBasis, the attribute width must be specified either in Neuron, NeuralLayer or NeuralNetwork. Again, width specified in Neuron will override a respective value from NeuralLayer, and in turn will override a value given in NeuralNetwork.

Weighted connections between neural net nodes are represented by Con elements.

object NoTrueChildStrategy extends Enumeration

Defines what to do in situations where scoring cannot reach a leaf node.

Defines what to do in situations where scoring cannot reach a leaf node.

class Node(val predicate: Predicate, val children: Array[Node], val id: Option[String], val score: Option[Any], val recordCount: Option[Double], val defaultChild: Option[String], val scoreDistributions: ScoreDistributions, val partition: Option[Partition], val embeddedModel: Option[EmbeddedModel]) extends Predicate with HasScoreDistributions

This element is an encapsulation for either defining a split or a leaf in a tree model. Every Node contains a predicate that identifies a rule for choosing itself or any of its siblings. A predicate may be an expression composed of other nested predicates.

This element is an encapsulation for either defining a split or a leaf in a tree model. Every Node contains a predicate that identifies a rule for choosing itself or any of its siblings. A predicate may be an expression composed of other nested predicates.

class PCell(val parameterName: String, val beta: Double, val targetCategory: Option[Any], val df: Option[Int]) extends PmmlElement

Cell in the ParamMatrix. The optional targetCategory and required parameterName attributes determine the cell's location in the Parameter matrix. The information contained is: beta (actual Parameter value, required), and df (degrees of freedom, optional). For ordinalMultinomial model ParamMatrix specifies different values for the intercept parameter: one for each target category except one. Values for all other parameters are constant across all target variable values. For multinomialLogistic model ParamMatrix specifies parameter estimates for each target category except the reference category.

Cell in the ParamMatrix. The optional targetCategory and required parameterName attributes determine the cell's location in the Parameter matrix. The information contained is: beta (actual Parameter value, required), and df (degrees of freedom, optional). For ordinalMultinomial model ParamMatrix specifies different values for the intercept parameter: one for each target category except one. Values for all other parameters are constant across all target variable values. For multinomialLogistic model ParamMatrix specifies parameter estimates for each target category except the reference category.

class PCovCell(val pRow: String, val pCol: String, val value: Double, val tRow: Option[String], val tCol: Option[String], val targetCategory: Option[Any]) extends PmmlElement
class PCovMatrix(val cells: Array[PCovCell], val tpe: Option[PCovMatrixType]) extends PmmlElement

Matrix of Parameter estimate covariances. Made up of PCovCells, each of them being located via row information for Parameter name (pRow), row information for target variable value (tRow), column information for Parameter name (pCol) and column information for target variable value (tCol). Note that the matrix is symmetric with respect to the main diagonal (interchanging tRow and tCol together with pRow and pCol will not change the value). Therefore it is sufficient that only half of the matrix be exported. Attributes tRow and tCol are optional since they are not needed for linear regression models. This element has an optional attribute type that can take values model and robust. This attribute describes the way the covariance matrix was computed in generalizedLinear model. The robust option is also known as Huber-White or sandwich or HCCM.

Matrix of Parameter estimate covariances. Made up of PCovCells, each of them being located via row information for Parameter name (pRow), row information for target variable value (tRow), column information for Parameter name (pCol) and column information for target variable value (tCol). Note that the matrix is symmetric with respect to the main diagonal (interchanging tRow and tCol together with pRow and pCol will not change the value). Therefore it is sufficient that only half of the matrix be exported. Attributes tRow and tCol are optional since they are not needed for linear regression models. This element has an optional attribute type that can take values model and robust. This attribute describes the way the covariance matrix was computed in generalizedLinear model. The robust option is also known as Huber-White or sandwich or HCCM.

object PCovMatrixType extends Enumeration
class PPCell(val value: Any, val predictorName: Field, val parameterName: String, val targetCategory: Option[Any]) extends PmmlElement

Cell in the PPMatrix. Knows its row name, column name.

Cell in the PPMatrix. Knows its row name, column name.

class PPMatrix(val cells: Array[PPCell]) extends PmmlElement

Predictor-to-Parameter correlation matrix. It is a rectangular matrix having a column for each Predictor (factor or covariate) and a row for each Parameter. The matrix is represented as a sequence of cells, each cell containing a number representing the correlation between the Predictor and the Parameter.

Predictor-to-Parameter correlation matrix. It is a rectangular matrix having a column for each Predictor (factor or covariate) and a row for each Parameter. The matrix is represented as a sequence of cells, each cell containing a number representing the correlation between the Predictor and the Parameter.

class PairCounts(val value: Any, val targetValueCounts: TargetValueCounts) extends PmmlElement

PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.

PairCounts lists, for a field Ii's discrete value Iij, the TargetValueCounts that pair the value Iij with each value of the target field.

class ParamMatrix(val cells: Array[PCell]) extends PmmlElement

Parameter matrix. A table containing the Parameter values along with associated statistics (degrees of freedom). One dimension has the target variable's categories, the other has the Parameter names. The table is represented by specifying each cell. There is no requirement for Parameter names other than that each name should uniquely identify one Parameter.

Parameter matrix. A table containing the Parameter values along with associated statistics (degrees of freedom). One dimension has the target variable's categories, the other has the Parameter names. The table is represented by specifying each cell. There is no requirement for Parameter names other than that each name should uniquely identify one Parameter.

class Parameter(val name: String, val label: Option[String], val referencePoint: Double) extends PmmlElement

Each Parameter contains a required name and optional label.

Each Parameter contains a required name and optional label.

Value parameters:
label

If present, is meant to give a hint on a Parameter's correlation with the Predictors.

name

Should be unique within the model and as brief as possible (since Parameter names appear frequently in the document).

referencePoint

The optional attribute referencePoint is used in Cox regression models only and has a default value of 0

class ParameterList(val parameters: Array[Parameter]) extends PmmlElement

Lists all Parameters. ParameterList can be empty only for CoxRegression models, for other models at least one Parameter should be present.

Lists all Parameters. ParameterList can be empty only for CoxRegression models, for other models at least one Parameter should be present.

class PolynomialKernelType(val gamma: Double, val coef0: Double, val degree: Double, val description: Option[String]) extends KernelType with PmmlElement

Polynomial basis functions which lead to a polynome classifier. K(x,y) = (gamma*<x,y>+coef0)degree

Polynomial basis functions which lead to a polynome classifier. K(x,y) = (gamma*<x,y>+coef0)degree

class Predictor(val name: String, val contrastMatrixType: Option[String], val categories: Option[Categories], val matrix: Option[Matrix]) extends PmmlElement

Describes a categorical (factor) or a continuous (covariate) predictor for the model. When describing a factor, it can optionally contain a list of categories and a contrast matrix. Such matrix describes the codings of categorical variables. If a categorical variable has n values, there will be n rows and n-1 or n columns in the matrix. The rows and columns correspond to the categories of the factor in the order listed in the Category element if it is present, otherwise in the order listed in the DataField or DerivedField element. If the Categories element is present and the corresponding DataField or DerivedField element has a list of valid categories, then the list in Categories should be a subset of that in DataField or DerivedField. A contrast matrix with n-1 columns helps to reduce the total number of parameters in the model.

Describes a categorical (factor) or a continuous (covariate) predictor for the model. When describing a factor, it can optionally contain a list of categories and a contrast matrix. Such matrix describes the codings of categorical variables. If a categorical variable has n values, there will be n rows and n-1 or n columns in the matrix. The rows and columns correspond to the categories of the factor in the order listed in the Category element if it is present, otherwise in the order listed in the DataField or DerivedField element. If the Categories element is present and the corresponding DataField or DerivedField element has a list of valid categories, then the list in Categories should be a subset of that in DataField or DerivedField. A contrast matrix with n-1 columns helps to reduce the total number of parameters in the model.

class RadialBasisKernelType(val gamma: Double, val description: Option[String]) extends KernelType with PmmlElement

Radial basis functions, the most common kernel type K(x,y) = exp(-gamma*||x - y||2)

Radial basis functions, the most common kernel type K(x,y) = exp(-gamma*||x - y||2)

object ReasonCodeAlgorithm extends Enumeration

Describes how reason codes shall be ranked.

Describes how reason codes shall be ranked.

class Regression(var parent: Model) extends EmbeddedModel
class RegressionAttributes(val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean, val modelType: Option[RegressionModelType], val targetFieldName: Option[String], val normalizationMethod: RegressionNormalizationMethod) extends ModelAttributes with HasRegressionAttributes
class RegressionModel(var parent: Model, val attributes: RegressionAttributes, val miningSchema: MiningSchema, val regressionTables: Array[RegressionTable], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedRegressionAttributes

The regression functions are used to determine the relationship between the dependent variable (target field) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on. While the term regression usually refers to the prediction of numeric values, the PMML element RegressionModel can also be used for classification. This is due to the fact that multiple regression equations can be combined in order to predict categorical values.

The regression functions are used to determine the relationship between the dependent variable (target field) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on. While the term regression usually refers to the prediction of numeric values, the PMML element RegressionModel can also be used for classification. This is due to the fact that multiple regression equations can be combined in order to predict categorical values.

object RegressionModelType extends Enumeration

Specifies the type of a regression model. The attribute modelType is for information only.

Specifies the type of a regression model. The attribute modelType is for information only.

object RegressionNormalizationMethod extends Enumeration

Describes how the prediction is converted into a confidence value (aka probability).

Describes how the prediction is converted into a confidence value (aka probability).

sealed trait Rule
Companion:
object
object Rule
Companion:
class
class RuleSelectionMethod(val criterion: Criterion) extends PmmlElement

Describes how rules are selected to apply the model to a new case

Describes how rules are selected to apply the model to a new case

Value parameters:
criterion

explains how to determine and rank predictions and their associated confidences from the ruleset in case multiple rules fire.

class RuleSet(val ruleSelectionMethods: Array[RuleSelectionMethod], val scoreDistributions: ScoreDistributions, val rules: Array[Rule], val recordCount: Option[Int], val nbCorrect: Option[Int], val defaultScore: Option[Any], val defaultConfidence: Option[Double]) extends PmmlElement
Value parameters:
defaultConfidence

provides a confidence to be returned with the default score (when scoring a case and no rules in the ruleset fire).

defaultScore

The value of score in a RuleSet serves as the default predicted value when scoring a case no rules in the ruleset fire.

nbCorrect

indicates the number of training/test instances for which the default score is correct.

recordCount

The number of training/test cases to which the ruleset was applied to generate support and confidence measures for individual rules.

ruleSelectionMethods

specifies how to select rules from the ruleset to score a new case. If more than one method is included, the first method is used as the default method for scoring, but the other methods included may be selected by the application wishing to perform scoring as valid alternative methods.

rules

contains 0 or more rules which comprise the ruleset.

scoreDistributions

describe the distribution of the predicted value in the test/training data.

class RuleSetModel(var parent: Model, val attributes: ModelAttributes, val miningSchema: MiningSchema, val ruleSet: RuleSet, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedModelAttributes

Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.

Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.

object SVMClassificationMethod extends Enumeration

The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one. Depending on the method used, the number of SVMs built will differ.

The two most popular methods for multi-class classification are one-against-all (also known as one-against-rest) and one-against-one. Depending on the method used, the number of SVMs built will differ.

The SVM classification method specifies which of both methods is used:

object SVMRepresentation extends Enumeration

Usually the SVM model uses support vectors to define the model function. However, for the case of a linear function (linear kernel type) the function is a linear hyperplane that can be more efficiently expressed using the coefficients of all mining fields. In this case, no support vectors are required at all, and hence SupportVectors will be absent and only the Coefficients element is necessary.

Usually the SVM model uses support vectors to define the model function. However, for the case of a linear function (linear kernel type) the function is a linear hyperplane that can be more efficiently expressed using the coefficients of all mining fields. In this case, no support vectors are required at all, and hence SupportVectors will be absent and only the Coefficients element is necessary.

The SVM representation specifies which of both representations is used:

class Scorecard(var parent: Model, val attributes: ScorecardAttributes, val miningSchema: MiningSchema, val characteristics: Characteristics, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedScorecardAttributes

A data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value.

A data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value.

Scorecards are very popular in the financial industry for their interpretability and ease of implementation, and because input attributes can be mapped to a series of reason codes which provide explanations of each individual's score. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services. Reason codes, as the name suggests, allow for an explanation of scorecard behavior and any adverse decisions generated as a consequence of the overall score. They basically answer the question: "Why is the score low, given its input conditions?"

class ScorecardAttributes(val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean, val initialScore: Double, val useReasonCodes: Boolean, val reasonCodeAlgorithm: ReasonCodeAlgorithm, val baselineScore: Option[Double], val baselineMethod: BaselineMethod) extends ModelAttributes with HasScorecardAttributes

Holds attributes of a Scorecard.

Holds attributes of a Scorecard.

class Segment(val predicate: Predicate, val model: Model, val variableWeight: Option[VariableWeight], val id: Option[String], val weight: Double) extends Predictable with Predicate with PmmlElement
class Segmentation(val multipleModelMethod: MultipleModelMethod, val segments: Array[Segment], val missingPredictionTreatment: MissingPredictionTreatment, val missingThreshold: Double) extends PmmlElement
class SigmoidKernelType(val gamma: Double, val coef0: Double, val description: Option[String]) extends KernelType with PmmlElement

Sigmoid kernel functions for some models of Neural Network type K(x,y) = tanh(gamma*<x,y>+coef0)

Sigmoid kernel functions for some models of Neural Network type K(x,y) = tanh(gamma*<x,y>+coef0)

class SimpleRule(val predicate: Predicate, val scoreDistributions: ScoreDistributions, val score: Any, val id: Option[String], val recordCount: Option[Int], val nbCorrect: Option[Int], val confidence: Double, val weight: Double) extends Rule with HasScoreDistributions with PmmlElement

SimpleRule consists of an identifier, a predicate, a score and information on rule performance.

SimpleRule consists of an identifier, a predicate, a score and information on rule performance.

Value parameters:
confidence

Indicates the confidence of the rule.

id

The value of id serves as a unique identifier for the rule. Must be unique within the ruleset.

nbCorrect

Indicates the number of training/test instances on which the rule fired and the prediction was correct.

predicate

the condition upon which the rule fires. For more details on PREDICATE see the section on predicates in TreeModel. This explains how predicates are described and evaluated and how missing values are handled.

recordCount

The number of training/test instances on which the rule fired.

score

The predicted value when the rule fires.

scoreDistributions

Describes the distribution of the predicted value for instances where the rule fires in the training/test data.

weight

Indicates the relative importance of the rule. May or may not be equal to the confidence.

object SplitCharacteristic extends Enumeration

Indicates whether non-leaf Nodes in the tree model have exactly two children, or an unrestricted number of children.

Indicates whether non-leaf Nodes in the tree model have exactly two children, or an unrestricted number of children.

class SupportVector(val vectorId: String) extends PmmlElement

SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary.

SupportVector which only has the attribute vectorId - the reference to the support vector in VectorDictionary.

class SupportVectorMachine(val supportVectors: Option[SupportVectors], val coefficients: Coefficients, val targetCategory: Option[Any], val alternateTargetCategory: Option[Any], val threshold: Option[Double]) extends PmmlElement

Holds a single instance of an SVM.

Holds a single instance of an SVM.

SupportVectors holds the support vectors as references towards VectorDictionary used by the respective SVM instance. For storing the SVM coefficients, the element Coefficients is used. Both are combined in the element SupportVectorMachine, which holds a single instance of an SVM.

The attribute targetCategory is required for classification models and gives the corresponding class label. This attribute is to be used for classification models implementing the one-against-all method. In this method, for n classes, there are exactly n SupportVectorMachine elements. Depending on the model attribute maxWins, the SVM with the largest or the smallest value determines the predicted class label.

The attribute alternateTargetCategory is required in case of binary classification models with only one SupportVectorMachine element. It is also required in case of multi-class classification models implementing the one-against-one method. In this method, for n classes, there are exactly n(n-1)/2 SupportVectorMachine elements where each SVM is trained on data from two classes. The first class is represented by the targetCategory attribute and the second class by the alternateTargetCategory attribute. The predicted class label is determined based on a voting scheme in which the category with the maximum number of votes wins. In case of a tie, the predicted class label is the first category with maximal number of votes. For both cases (binary classification and multi-class classification with one-against-one), the corresponding class labels are determined by comparing the numeric prediction with the threshold. If maxWins is true and the prediction is larger than the threshold or maxWins is false and the prediction is smaller than the threshold, the class label is the targetCategory attribute, otherwise, it is the alternateTargetCategory attribute.

Note that each SupportVectorMachine element may have its own threshold that overrides the default.

class SupportVectorMachineAttributes(val functionName: MiningFunction, val threshold: Double, val svmRepresentation: SVMRepresentation, val classificationMethod: SVMClassificationMethod, val maxWins: Boolean, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean) extends ModelAttributes with HasSupportVectorMachineAttributes
class SupportVectorMachineModel(var parent: Model, val attributes: SupportVectorMachineAttributes, val miningSchema: MiningSchema, val kernelType: KernelType, val vectorDictionary: VectorDictionary, val supportVectorMachines: Array[SupportVectorMachine], val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedSupportVectorMachineAttributes

Support Vector Machine models for classification and regression are considered. A Support Vector Machine is a function f which is defined in the space spanned by the kernel basis functions K(x,xi) of the support vectors xi: f(x) = Sum_(i=1)n αi*K(x,xi) + b.

Support Vector Machine models for classification and regression are considered. A Support Vector Machine is a function f which is defined in the space spanned by the kernel basis functions K(x,xi) of the support vectors xi: f(x) = Sum_(i=1)n αi*K(x,xi) + b.

Here n is the number of all support vectors, αi are the basis coefficients and b is the absolute coefficient. In an equivalent interpretation, n could also be considered as the total number of all training vectors xi. Then the support vectors are the subset of all those vectors xi whose coefficients αi are greater than zero. The term Support Vector (SV) has also a geometrical interpretation because these vectors really support the discrimination function f(x) = 0 in the mechanical interpretation.

class SupportVectors(val supportVectors: Array[SupportVector]) extends PmmlElement

Contains all support vectors required for the respective SVM instance.

Contains all support vectors required for the respective SVM instance.

class TargetValueCount(val value: Any, val count: Double) extends PmmlElement
class TargetValueCounts(val targetValueCounts: Array[TargetValueCount]) extends PmmlElement

Lists the counts associated with each value of the target field, However, a TargetValueCount whose count is zero may be omitted. Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value. Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.

Lists the counts associated with each value of the target field, However, a TargetValueCount whose count is zero may be omitted. Within BayesOutput, TargetValueCounts lists the total count of occurrences of each target value. Within PairCounts, TargetValueCounts lists, for each target value, the count of the joint occurrences of that target value with a particular discrete input value.

class TargetValueStat(val value: Any, val distribution: ContinuousDistribution) extends PmmlElement

Used for a continuous input field Ii to define statistical measures associated with each value of the target field. As defined in CONTINUOUS-DISTRIBUTION-TYPES, different distribution types can be used to represent such measures. For Bayes models, these are restricted to Gaussian and Poisson distributions.

Used for a continuous input field Ii to define statistical measures associated with each value of the target field. As defined in CONTINUOUS-DISTRIBUTION-TYPES, different distribution types can be used to represent such measures. For Bayes models, these are restricted to Gaussian and Poisson distributions.

class TargetValueStats(val targetValueStats: Array[TargetValueStat]) extends PmmlElement

Serves as the envelope for element TargetValueStat.

Serves as the envelope for element TargetValueStat.

class TrainingInstances(val instanceFields: InstanceFields, val table: Table, val isTransformed: Boolean, val recordCount: Option[Int], val fieldCount: Option[Int]) extends PmmlElement

Encapsulates the definition of the fields included in the training instances as well as their values.

Encapsulates the definition of the fields included in the training instances as well as their values.

Value parameters:
fieldCount

Defines the number of fields (features + targets). This number needs to match the number of InstanceField elements defined under InstanceFields.

instanceFields

Defines all the fields included in the training instances.

isTransformed

Used as a flag to determine whether or not the training instances have already been transformed. If isTransformed is "false", it indicates that the training data has not been transformed yet. If "true", it indicates that it has already been transformed.

recordCount

Defines the number of training instances or records. This number needs to match the number of instances defined in the element InlineTable or in the external data if TableLocator is used.

table

Representing the training data (feature vectors and class labels)

class TransformationModel(val version: String, val header: Header, val dataDictionary: DataDictionary, val transformationDictionary: Option[TransformationDictionary]) extends DataModel
class TreeAttributes(val functionName: MiningFunction, val modelName: Option[String], val algorithmName: Option[String], val isScorable: Boolean, val missingValueStrategy: MissingValueStrategy, val missingValuePenalty: Double, val noTrueChildStrategy: NoTrueChildStrategy, val splitCharacteristic: SplitCharacteristic) extends ModelAttributes with HasTreeAttributes

Holds attributes of a Tree model

Holds attributes of a Tree model

class TreeModel(var parent: Model, val attributes: TreeAttributes, val miningSchema: MiningSchema, val node: Node, val output: Option[Output], val targets: Option[Targets], val localTransformations: Option[LocalTransformations], val modelStats: Option[ModelStats], val modelExplanation: Option[ModelExplanation], val modelVerification: Option[ModelVerification], val extensions: Seq[Extension]) extends Model with HasWrappedTreeAttributes

The TreeModel in PMML allows for defining either a classification or prediction structure. Each Node holds a logical predicate expression that defines the rule for choosing the Node or any of the branching Nodes.

The TreeModel in PMML allows for defining either a classification or prediction structure. Each Node holds a logical predicate expression that defines the rule for choosing the Node or any of the branching Nodes.

class VariableWeight(val field: Field) extends PmmlElement
class VectorDictionary(val vectorFields: VectorFields, val vectorInstances: Array[VectorInstance]) extends PmmlElement

Contains the set of support vectors which are of the typeVectorInstance.

Contains the set of support vectors which are of the typeVectorInstance.

class VectorFields(val vectorFields: Array[DoubleEvaluator]) extends PmmlElement

Defines which entries in the vectors correspond to which fields.

Defines which entries in the vectors correspond to which fields.

class VectorInstance(val id: String, val array: Vector[Double]) extends PmmlElement

A data vector given in dense or sparse array format. The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.

A data vector given in dense or sparse array format. The order of the values corresponds to that of the VectorFields. The sizes of the sparse arrays must match the number of fields included in the VectorFields element.