package transformations
At various places the mining models use simple functions in order to map user data to values that are easier to use in the specific model. For example, neural networks internally work with numbers, usually in the range from 0 to 1. Numeric input data are mapped to the range [0..1], and categorical fields are mapped to series of 0/1 indicators.
PMML defines various kinds of simple data transformations:
- Normalization: map values to numbers, the input can be continuous or discrete.
- Discretization: map continuous values to discrete values.
- Value mapping: map discrete values to discrete values.
- Text Indexing: derive a frequency-based value for a given term.
- Functions: derive a value by applying a function to one or more parameters
- Aggregation: summarize or collect groups of values, e.g., compute average.
- Lag: use a previous value of the given input field.
- Alphabetic
- By Inheritance
- transformations
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- class Apply extends Expression
Apply defines the application of a function.
Apply defines the application of a function. The function itself is identified by name with the function attribute. The actual parameters of the function application are given in the content of the element. Each actual argument value is given by an EXPRESSION and are mapped by position to the formal parameters in the corresponding function definition.
- trait BinaryArithmetic extends BinaryFunction
- trait BinaryBoolean extends BinaryFunction
- trait BinaryCompare extends BinaryFunction
- trait BinaryFunction extends Function
- trait BinaryString extends BinaryFunction
- class Constant extends LeafExpression
Constant values can be used in expressions which have multiple arguments.
Constant values can be used in expressions which have multiple arguments. . The actual value of a constant is given by the content of the element. For example, <Constant>1.05</Constant> represents the number 1.05. The dataType of Constant can be optionally specified.
- class DefineFunction extends Function with HasOpType with HasDataType with PmmlElement
Defines new (user-defined) functions as variations or compositions of existing functions or transformations.
Defines new (user-defined) functions as variations or compositions of existing functions or transformations. The function's name must be unique and must not conflict with other function names, either defined by PMML or other user-defined functions. The EXPRESSION in the content of DefineFunction is the function body that actually defines the meaning of the new function. The function body must not refer to fields other than the parameter fields.
- class DerivedField extends DataField with Expression
Provides a common element for the various mappings.
Provides a common element for the various mappings. They can also appear at several places in the definition of specific models such as neural network or Naive Bayes models. Transformed fields have a name such that statistics and the model can refer to these fields.
- class Discretize extends FieldExpression
Discretization of numerical input fields is a mapping from continuous to discrete values using intervals.
- class DiscretizeBin extends PmmlElement
- trait Expression extends Evaluator with PmmlElement
Trait of Expression that defines how the values of the new field are computed.
- class FieldColumnPair extends PmmlElement
- trait FieldExpression extends UnaryExpression
- class FieldRef extends FieldExpression with MixedEvaluator
Field references are simply pass-throughs to fields previously defined in the DataDictionary, a DerivedField, or a result field.
Field references are simply pass-throughs to fields previously defined in the DataDictionary, a DerivedField, or a result field. For example, they are used in clustering models in order to define center coordinates for fields that don't need further normalization.
A missing input will produce a missing result. The optional attribute mapMissingTo may be used to map a missing result to the value specified by the attribute. If the attribute is not present, the result remains missing.
- trait Function extends PmmlElement
- trait FunctionProvider extends AnyRef
- trait HasFunctionProvider extends AnyRef
- trait HasLocalTransformations extends AnyRef
- trait LeafExpression extends Expression
- class LinearNorm extends PmmlElement
- class LocalTransformations extends TransformationDictionary
LocalTransformations holds derived fields that are local to the model.
- class MapValues extends Expression
Any discrete value can be mapped to any possibly different discrete value by listing the pairs of values.
Any discrete value can be mapped to any possibly different discrete value by listing the pairs of values. This list is implemented by a table, so it can be given inline by a sequence of XML markups or by a reference to an external table.
- trait MultipleArithmetic extends Function
- trait MultipleBoolean extends Function
- class MutableFunctionProvider extends FunctionProvider
- class NormContinuous extends NumericFieldExpression
Normalization provides a basic framework for mapping input values to specific value ranges, usually the numeric range [0 ..
Normalization provides a basic framework for mapping input values to specific value ranges, usually the numeric range [0 .. 1]. Normalization is used, e.g., in neural networks and clustering models.
Defines how to normalize an input field by piecewise linear interpolation. The mapMissingTo attribute defines the value the output is to take if the input is missing. If the mapMissingTo attribute is not specified, then missing input values produce a missing result.
- class NormDiscrete extends FieldExpression
Encode string values into numeric values in order to perform mathematical computations.
Encode string values into numeric values in order to perform mathematical computations. For example, regression and neural network models often split categorical and ordinal fields into multiple dummy fields. This kind of normalization is supported in PMML by the element NormDiscrete.
An element (f, v) defines that the unit has value 1.0 if the value of input field f is v, otherwise it is 0.
The set of NormDiscrete instances which refer to a certain input field define a fan-out function which maps a single input field to a set of normalized fields.
If the input value is missing and the attribute mapMissingTo is not specified then the result is a missing value as well. If the input value is missing and the attribute mapMissingTo is specified then the result is the value of the attribute mapMissingTo.
- trait NumericFieldExpression extends FieldExpression
- class ParameterField extends AbstractField
- trait TernaryArithmetic extends TernaryFunction
- trait TernaryFunction extends Function
- class TextIndex extends NumericFieldExpression
The TextIndex element fully configures how the text in textField should be processed and translated into a frequency metric for a particular term of interest.
The TextIndex element fully configures how the text in textField should be processed and translated into a frequency metric for a particular term of interest. The actual frequency metric to be returned is defined through the localTermWeights attribute.
- class TextIndexNormalization extends PmmlElement
A TextIndexNormalization element offers more advanced ways of normalizing text input into a more controlled vocabulary that corresponds to the terms being used in invocations of this indexing function.
A TextIndexNormalization element offers more advanced ways of normalizing text input into a more controlled vocabulary that corresponds to the terms being used in invocations of this indexing function. The normalization operation is defined through a translation table, specified through a TableLocator or InlineTable element.
- class TransformationDictionary extends Dictionary[DerivedField] with Transformer with FunctionProvider with PmmlElement
The TransformationDictionary allows for transformations to be defined once and used by any model element in the PMML document.
- trait UnaryArithmetic extends UnaryFunction
- trait UnaryBoolean extends UnaryFunction
- trait UnaryExpression extends Expression
- trait UnaryFunction extends Function
- trait UnaryString extends UnaryFunction
Value Members
- object ACos extends UnaryArithmetic
- object ASin extends UnaryArithmetic
- object ATan extends UnaryArithmetic
- object Abs extends UnaryArithmetic
- object Add extends BinaryArithmetic
- object And extends MultipleBoolean
- object Avg extends MultipleArithmetic
- object BuiltInFunctions extends FunctionProvider
- object Ceil extends UnaryArithmetic
- object Concat extends Function
- object Cos extends UnaryArithmetic
- object CosH extends UnaryArithmetic
- object CountHits extends Enumeration
- allHits: count all hits - bestHits: count all hits with the lowest Levenshtein distance
- object DateDaysSinceYear extends BinaryFunction
- object DateSecondsSinceMidnight extends UnaryFunction
- object DateSecondsSinceYear extends BinaryFunction
- object Divide extends BinaryArithmetic
- object Equal extends BinaryBoolean
- object Erf extends UnaryArithmetic
- object Exp extends UnaryArithmetic
- object Expm1 extends UnaryArithmetic
- object Expression extends Serializable
- object Floor extends UnaryArithmetic
- object FormatDatetime extends BinaryFunction
- object FormatNumber extends BinaryFunction
- object GreaterOrEqual extends BinaryCompare
- object GreaterThan extends BinaryCompare
- object Hypot extends BinaryArithmetic
- object If extends Function
- object IsIn extends Function
- object IsMissing extends UnaryBoolean
- object IsNotIn extends Function
- object IsNotMissing extends UnaryBoolean
- object IsNotValid extends UnaryBoolean
- object IsValid extends UnaryBoolean
- object LessOrEqual extends BinaryCompare
- object LessThan extends BinaryCompare
- object Ln extends UnaryArithmetic
- object Ln1p extends UnaryArithmetic
- object LocalTermWeights extends Enumeration
- termFrequency: use the number of times the term occurs in the document (x = freqi).
- termFrequency: use the number of times the term occurs in the document (x = freqi). - binary: use 1 if the term occurs in the document or 0 if it doesn't (x = χ(freqi)). - logarithmic: take the logarithm (base 10) of 1 + the number of times the term occurs in the document. (x = log(1 + freqi)) - augmentedNormalizedTermFrequency: this formula adds to the binary frequency a "normalized" component expressing the frequency of a term relative to the highest frequency of terms observed in that document (x = 0.5 * (χ(freqi) + (freqi / maxk(freqk))) )
- object Log10 extends UnaryArithmetic
- object Lowercase extends UnaryString
- object Matches extends BinaryBoolean
- object Max extends MultipleArithmetic
- object Median extends MultipleArithmetic
- object Min extends MultipleArithmetic
- object Modulo extends BinaryArithmetic
- object Multiply extends BinaryArithmetic
- object NormalCDF extends TernaryArithmetic
- object NormalIDF extends TernaryArithmetic
- object NormalPDF extends TernaryArithmetic
- object Not extends UnaryFunction
- object NotEqual extends BinaryBoolean
- object Or extends MultipleBoolean
- object Pow extends BinaryArithmetic
- object Product extends MultipleArithmetic
- object RInt extends UnaryArithmetic
- object Replace extends TernaryFunction
- object Round extends UnaryArithmetic
- object SAS-EM-String-Normalize extends BinaryFunction
<DefineFunction name="SAS-EM-String-Normalize" optype="categorical" dataType="string"> <ParameterField name="FMTWIDTH" optype="continuous"/> <ParameterField name="AnyCInput" optype="categorical"/> <Apply function="trimBlanks"> <Apply function="uppercase"> <Apply function="substring"> <FieldRef field="AnyCInput"/> <Constant>1</Constant> <Constant>FMTWIDTH</Constant> </Apply> </Apply> </Apply> </DefineFunction>
- object SAS-FORMAT-$CHARw extends BinaryFunction
<DefineFunction name="SAS-FORMAT-$CHARw" optype="categorical" dataType="string"> <ParameterField name="FMTWIDTH" optype="continuous"/> <ParameterField name="AnyCInput" optype="continuous"/> <Apply function="substring"> <FieldRef field="AnyCInput"/> <Constant>1</Constant> <Constant>FMTWIDTH</Constant> </Apply> </DefineFunction>
- object SAS-FORMAT-BESTw extends BinaryFunction
<DefineFunction name="SAS-FORMAT-BESTw" optype="categorical" dataType="string"> <ParameterField name="FMTWIDTH" optype="continuous"/> <ParameterField name="AnyNInput" optype="continuous"/> <Apply function="formatNumber"> <FieldRef field="AnyNInput"/> <Constant>FMTWIDTH</Constant> </Apply> </DefineFunction>
- object Sin extends UnaryArithmetic
- object SinH extends UnaryArithmetic
- object Sqrt extends UnaryArithmetic
- object StdNormalCDF extends UnaryArithmetic
- object StdNormalIDF extends UnaryArithmetic
- object StdNormalPDF extends UnaryArithmetic
- object StringLength extends UnaryFunction
- object Substring extends TernaryFunction
- object Subtract extends BinaryArithmetic
- object Sum extends MultipleArithmetic
- object Tan extends UnaryArithmetic
- object TanH extends UnaryArithmetic
- object TextIndex extends Serializable
- object Threshold extends BinaryArithmetic
- object TrimBlanks extends UnaryString
- object Uppercase extends UnaryString
- object UserDefinedFunctions extends FunctionProvider
Defines several user-defined functions produced by various vendors, actually, well-defined "DefineFunction" is fully supported by pmml4s, while some could be not.
Defines several user-defined functions produced by various vendors, actually, well-defined "DefineFunction" is fully supported by pmml4s, while some could be not. Here is the place for those user-defined functions are not well defined.