Utility for automatically assembling columns into a vector of features.
Params for automatic feature-vector assembler.
Simple evaluator based on the mllib.
Estimator is used to select the proper item sample rate to achive desired size of the resulting sample.
Model applied as a transformer, but the resulting data set is not determenistic (each pass produces different results).
Follows ideas from Combined Regression and Ranking paper (http://www.
Used to extract a set of columns from the underlying data frame based on names and/or SQL expresions.
Base class for combined model holding a named map of nested models.
Used to train and evaluate model in folds.
Created by dmitriybugaichenko on 10.
Helper class for training single-label models.
Base class for evaluators.
Created by eugeny.
Created by dmitriybugaichenko on 30.
Created by dmitriybugaichenko on 29.
Utility used to split training into forks (per type, per class, per fold).
Specific case of forked estimator which does not change the type of the underlying model.
Used for evaluators with batch support
For estimators capable of caching training data.
Adds parameter with column for instance classes.
Adds parameter wot classes weights (defaults to 1.
For vector assemblers used to provide better naming for metadata attrbiutes.
Parameters for specifying which columns to include or exclude.
Created by dmitriybugaichenko on 30.
Supplementary train used for optimization (moving transformation out of the execution plan into UDF)
Block with information regarding features significance stat, produced during the features selection stage.
Adds parameters for folding - number of folds and name of column with fold number.
For transformers performing grouping by a certain columns.
Adds parameter with the name of test/train split column
Metrics block is added by the evaluators.
Created by dmitriybugaichenko on 19.
Used to indicate that last weight should not be considered as a part of regularization (typically if it is the intercept)
For transformers performing sorting by a certain columns.
Adds parameter with column for instance type.
Block produced by a models with concept of feature weights (eg.
Adds extra column to features vector with a fixed value of 1.
:: Experimental :: Isotonic regression.
ml.
Combination model which evaluates ALL nested model and combines results based on linear weights.
Single-label linear regresion with DSVRGD
Multi-label linear regresion with DSVRGD
Multi-label logistic regresion with DSVRGD
Multi-label logistic regresion with DSVRGD
Utility used to bridge default spark ML models into our advanced pipelines.
Created by dmitriybugaichenko on 24.
Created by alexander.
One of main extensions to the base concept of model - each model might return a summary represented by a named collection of dataframes.
In case if we can avoid certain stages used during training while predicting we need to propagate some changes to the model (eg.
Model which has a summary.
Combination model which evaluates ALL nested model and returns vector.
Base class for models, evaluated per each class.
Utility for converting columns with string or a set of stings into a vector of 0/1 with the cardinality equal to the number of unique string values used.
Model produced by the multinominal extractor.
Parameters for multinominal feature extractor.
Estimates mean values ignoring NaN's
Model used to replace values with pre-computed defaults before training/predicting.
Set of parameters for the replacer
Assuming there is a metadata attached to a integer field can be used to replace ints with corresponding attribute names.
Utility used to replace null values with defaults (zero or false).
:: Experimental :: A feature transformer that merges multiple columns into a vector column.
Evaluator used to compute metrics for predictions grouped by a certain criteria (typically by a user id).
Settings for partitioning, except the number of partitions.
This is a specific implementation of the scaler for linear models.
Scaler parameters.
Selecting model applies exactly one model based on instance type and return its result.
Serializable wrapper over the TDigest
Simple utility used to apply SQL WHERE filter
Estimator with produces model with summary.
Created by eugeny.
Created by eugeny.
In case if we can avoid certain stages used during training while predicting we need to propagate some changes to the model (eg.
Utility used to extract nested values from vectors into dedicated columns.
Utility used to collect detailed stat for vectors grouped by a certain keys.
Utility used for reporting single indexed feature weight.
Adds read logic
Adds read ability.
Helper used to inject common task support with thread count limit into all forked estimators.
Helper for reading and writing models in a typed way.
Adds read logic
Adds read ability
Adds support for reading.
Adds read ability.