org.pmml4s.metadata
Type members
Classlikes
Abstract class for field in a PMML with common implementations.
Abstract class for field in a PMML with common implementations.
Specifies which scoring algorithm to use when computing the output value. It applies only to Association Rules models.
Specifies which scoring algorithm to use when computing the output value. It applies only to Association Rules models.
- Companion:
- object
If a regression model should predict integers, use the attribute castInteger to control how decimal places should be handled.
If a regression model should predict integers, use the attribute castInteger to control how decimal places should be handled.
- Companion:
- object
- Companion:
- object
Contains definitions for fields as used in mining models. It specifies the types and value ranges. These definitions are assumed to be independent of specific data sets as used for training or scoring a specific model.
Contains definitions for fields as used in mining models. It specifies the types and value ranges. These definitions are assumed to be independent of specific data sets as used for training or scoring a specific model.
- Companion:
- object
Defines a field as used in mining models. It specifies the types and value ranges.
Defines a field as used in mining models. It specifies the types and value ranges.
The Decisions element contains an element Decision for every possible value of the decision.
The Decisions element contains an element Decision for every possible value of the decision.
Abstract class for field in a PMML.
Abstract class for field in a PMML.
The Output section in the model specifies names for columns in an output table and describes how to compute the corresponding values.
The Output section in the model specifies names for columns in an output table and describes how to compute the corresponding values.
This field specifies how invalid input values are handled.
This field specifies how invalid input values are handled.
- returnInvalid is the default and specifies that, when an invalid input is encountered, the model should return a value indicating an invalid result has been returned.
- asIs means to use the input without modification.
- asMissing specifies that an invalid input value should be treated as a missing value and follow the behavior specified by the missingValueReplacement attribute if present (see above). If asMissing is specified but there is no respective missingValueReplacement present, a missing value is passed on for eventual handling by successive transformations via DerivedFields or in the actual mining model.
- asValue specifies that an invalid input value should be replaced with the value specified by attribute invalidValueReplacement which must be present in this case, or the PMML is invalid.
MiningFields also define the usage of each field (active, supplementary, target, ...) as well as policies for treating missing, invalid or outlier values.
MiningFields also define the usage of each field (active, supplementary, target, ...) as well as policies for treating missing, invalid or outlier values.
- Value parameters:
- importance
States the relative importance of the field.
- invalidValueTreatment
Specifies how invalid input values are handled.
- missingValueReplacement
If this attribute is specified then a missing input value is automatically replaced by the given value. That is, the model itself works as if the given value was found in the original input. For example the surrogate operator in TreeModel does not apply if the MiningField specifies a replacement value.
- missingValueTreatment
This field is for information only.
- name
Symbolic name of field, must refer to a field in the scope of the parent of the MiningSchema's model element.
- opType
The attribute value overrides the corresponding value in the DataField. That is, a DataField can be used with different optypes in different models. For example, a 0/1 indicator could be used as a numeric input field in a regression model while the same field is used as a categorical field in a tree model.
The MiningSchema is the Gate Keeper for its model element. All data entering a model must pass through the MiningSchema. Each model element contains one MiningSchema which lists fields as used in that model. While the MiningSchema contains information that is specific to a certain model, the DataDictionary contains data definitions which do not vary per model. The main purpose of the MiningSchema is to list the fields that have to be provided in order to apply the model.
The MiningSchema is the Gate Keeper for its model element. All data entering a model must pass through the MiningSchema. Each model element contains one MiningSchema which lists fields as used in that model. While the MiningSchema contains information that is specific to a certain model, the DataDictionary contains data definitions which do not vary per model. The main purpose of the MiningSchema is to list the fields that have to be provided in order to apply the model.
In a PMML consumer this field is for information only, unless the value is returnInvalid, in which case if a missing value is encountered in the given field, the model should return a value indicating an invalid result; otherwise, the consumer only looks at missingValueReplacement - if a value is present it replaces missing values. Except as described above, the missingValueTreatment attribute just indicates how the missingValueReplacement was derived, but places no behavioral requirement on the consumer.
In a PMML consumer this field is for information only, unless the value is returnInvalid, in which case if a missing value is encountered in the given field, the model should return a value indicating an invalid result; otherwise, the consumer only looks at missingValueReplacement - if a value is present it replaces missing values. Except as described above, the missingValueTreatment attribute just indicates how the missingValueReplacement was derived, but places no behavioral requirement on the consumer.
Outliers
Outliers
- asIs: field values treated at face value.
- asMissingValues: outlier values are treated as if they were missing.
- asExtremeValues: outlier values are changed to a specific high or low value defined in MiningField.
Output element describes a set of result values that can be returned from a model.
Output element describes a set of result values that can be returned from a model.
OutputField elements specify names, types and rules for calculating specific result features. This information can be used while writing an output table.
OutputField elements specify names, types and rules for calculating specific result features. This information can be used while writing an output table.
- Companion:
- object
Applies only to Association Rules and is used to specify which criterion is used to sort the output result. For instance, the result could be sorted by the confidence, support or lift of the rules.
Applies only to Association Rules and is used to specify which criterion is used to sort the output result. For instance, the result could be sorted by the confidence, support or lift of the rules.
Determines the sorting order when ranking the results. The default behavior (rankOrder="descending") indicates that the result with the highest rank will appear first on the sorted list.
Determines the sorting order when ranking the results. The default behavior (rankOrder="descending") indicates that the result with the highest rank will appear first on the sorted list.
Specifies which feature of an association rule to return. This attribute has been deprecated as of PMML 4.2. The rule feature values can now be specified in the feature attribute.
Specifies which feature of an association rule to return. This attribute has been deprecated as of PMML 4.2. The rule feature values can now be specified in the feature attribute.
Note that castInteger, min, max, rescaleConstant and rescaleFactor only apply to models of type regression. Furthermore, they must be applied in sequence, which is:
Note that castInteger, min, max, rescaleConstant and rescaleFactor only apply to models of type regression. Furthermore, they must be applied in sequence, which is:
min and max rescaleFactor rescaleConstant castInteger
- Value parameters:
- castInteger
If a regression model should predict integers, use the attribute castInteger to control how decimal places should be handled.
- field
must refer to a name of a DataField or DerivedField. It can be absent when the model is used inside a Segment of a MiningModel and does not have a real target field in the input data
- max
If max is present, the predicted value will be max if it is larger than that.
- min
If min is present, the predicted value will be the value of min if it is smaller than that.
- optype
When Target specifies optype then it overrides the optype attribute in a corresponding MiningField, if it exists. If the target does not specify optype then the MiningField is used as default. And, in turn, if the MiningField does not specify an optype, it is taken from the corresponding DataField. In other words, a MiningField overrides a DataField, and a Target overrides a MiningField.
- rescaleConstant
can be used for simple rescale of the predicted value: First off, the predicted value is multiplied by rescaleFactor.
- rescaleFactor
after that, rescaleConstant is added to the predicted value.
- targetValues
In classification models, TargetValue is required. For regression models, TargetValue is only optional.
- Value parameters:
- defaultValue
the counterpart of prior probabilities for continuous fields. Usually the value is the mean of the target values in the training data. The attribute defaultValue is used only if the optype of the field is continuous.
- displayValue
usually more readable version which can be used by PMML consumers to display values in scoring results or other applications.
- priorProbability
specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. The attribute priorProbability is used only if the optype of the field is categorical or ordinal.
- value
corresponds to the class labels in a classification model.
Usage type
Usage type
- active: field used as input (independent field).
- target: field that was used a training target for supervised models.
- predicted: field whose value is predicted by the model. As of PMML 4.2, this is deprecated and it has been replaced by the usage type target.
- supplementary: field holding additional descriptive information. Supplementary fields are not required to apply a model. They are provided as additional information for explanatory purpose, though. When some field has gone through preprocessing transformations before a model is built, then an additional supplementary field is typically used to describe the statistics for the original field values.
- group: field similar to the SQL GROUP BY. For example, this is used by AssociationModel and SequenceModel to group items into transactions by customerID or by transactionID.
- order: This field defines the order of items or transactions and is currently used in SequenceModel and TimeSeriesModel. Similarly to group, it is motivated by the SQL syntax, namely by the ORDER BY statement.
- frequencyWeight and analysisWeight: These fields are not needed for scoring, but provide very important information on how the model was built. Frequency weight usually has positive integer values and is sometimes called "replication weight". Its values can be interpreted as the number of times each record appears in the data. Analysis weight can have fractional positive values, it could be used for regression weight in regression models or for case weight in trees, etc. It can be interpreted as different importance of the cases in the model. Counts in ModelStats and Partitions can be computed using frequency weight, mean and standard deviation values can be computed using both weights.
Defines the wrapped field that contains an internal field acts all operations.
Defines the wrapped field that contains an internal field acts all operations.