Chapter 3 Cheat Sheets
3.1 Basic Function
as_bench_result(x, ) converts object to benchmark result for visualization
mlr3 Dictionaries The dictionaries stores all the classes with functions that we can use in this mlr3 library.
-mlr_tasks
-mlr_task_generators
-mlr_learners
-mlr_measures
-mlr_resampling
Example usages: The function keys() returns all learners keys prebuilt in mlr3 package. If we install mlr3learners package, we will get an extension version of it. Installing the according libraries and extend the dictionaries.
$keys(pattern = NULL) mlr_learners
## [1] "classif.cv_glmnet" "classif.debug" "classif.featureless"
## [4] "classif.glmnet" "classif.kknn" "classif.lda"
## [7] "classif.log_reg" "classif.multinom" "classif.naive_bayes"
## [10] "classif.nnet" "classif.qda" "classif.ranger"
## [13] "classif.rpart" "classif.svm" "classif.xgboost"
## [16] "regr.cv_glmnet" "regr.debug" "regr.featureless"
## [19] "regr.glmnet" "regr.kknn" "regr.km"
## [22] "regr.lm" "regr.nnet" "regr.ranger"
## [25] "regr.rpart" "regr.svm" "regr.xgboost"
For a brief introduction, we will explain the keywords in these six to get a better understanding:
-classif means it is used to solve classification related problems, regr means it is used to solve regression related problems.
-featureless means that the learner will ignore all the features during train and only consider the response.
-classif.rpart is a LearnerClassif for a classification tree implemented and regr.rpart is a LearnerRegr for a regression tree implemented. These two functions will take in features during training.
-debug learner used for debugging purposes.
The function get() retrieves object by key. It will show all the information about the key.
$get("classif.rpart") mlr_learners
## <LearnerClassifRpart:classif.rpart>: Classification Tree
## * Model: -
## * Parameters: xval=0
## * Packages: mlr3, rpart
## * Predict Types: [response], prob
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
## twoclass, weights
The function makes dictionary to data.table form.
head(as.data.table(mlr_tasks))
## key label task_type nrow ncol properties lgl
## 1: boston_housing Boston Housing Prices regr 506 19 0
## 2: breast_cancer Wisconsin Breast Cancer classif 683 10 twoclass 0
## 3: german_credit German Credit classif 1000 21 twoclass 0
## 4: iris Iris Flowers classif 150 5 multiclass 0
## 5: mtcars Motor Trends regr 32 11 0
## 6: penguins Palmer Penguins classif 344 8 multiclass 0
## int dbl chr fct ord pxc
## 1: 3 13 0 2 0 0
## 2: 0 0 0 0 9 0
## 3: 3 0 0 14 3 0
## 4: 0 4 0 0 0 0
## 5: 0 10 0 0 0 0
## 6: 3 2 0 2 0 0
- Tasks
Target determines the machine learning Task. We can create a classification task:
= as_task_classif(x = iris, target = "Species")
task1 task1
## <TaskClassif:iris> (150 x 5)
## * Target: Species
## * Properties: multiclass
## * Features (4):
## - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
We can create a regression task:
<- c(33, 55, 25)
age <- c(20000, 50000, 15000)
salary <- data.frame(age, salary)
df = as_task_regr(x= df, target = "salary")
task2 task2
## <TaskRegr:df> (3 x 2)
## * Target: salary
## * Properties: -
## * Features (1):
## - dbl (1): age
We can also use the example tasks in mlr_tasks by calling tsk(task_name):
= tsk("zoo")
task3 task3
## <TaskClassif:zoo> (101 x 17): Zoo Animals
## * Target: type
## * Properties: multiclass
## * Features (16):
## - lgl (15): airborne, aquatic, backbone, breathes, catsize, domestic,
## eggs, feathers, fins, hair, milk, predator, tail, toothed, venomous
## - int (1): legs
We can perform some functions on the task: task$positive = “
#return number of rows
$nrow task1
## [1] 150
#return number of columns
$ncol task1
## [1] 5
#subset the task by selecting features
$select("Sepal.Length") task1
task$cbind(data) adds columns
task$rbind(data) adds rows
task$feature_names return feature names in the task
- Learner
To use a learner, we can call the method using:
learner = mlr_learners$get(method) or
learner = lrn(method)
Here is an example:
= lrn("regr.rpart")
learner learner
## <LearnerRegrRpart:regr.rpart>: Regression Tree
## * Model: -
## * Parameters: xval=0
## * Packages: mlr3, rpart
## * Predict Types: [response]
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, selected_features, weights
- Train
We train our task using the learner we chose:
learner$train(task, row_ids)
learner$model: the model is stored and viewed
Split on test/train:
train_set = sample(task$nrow, (percentage) * my_task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
- Predict
These two methods will predict on the select data: prediction = learner$predict(task, row_ids)
prediction = learner$predict_newdata(data)
- Model Evaluation
Here are the model evaluation metrics in the mlr_measures library:
$keys(pattern = NULL) mlr_measures
## [1] "aic" "bic" "classif.acc"
## [4] "classif.auc" "classif.bacc" "classif.bbrier"
## [7] "classif.ce" "classif.costs" "classif.dor"
## [10] "classif.fbeta" "classif.fdr" "classif.fn"
## [13] "classif.fnr" "classif.fomr" "classif.fp"
## [16] "classif.fpr" "classif.logloss" "classif.mauc_au1p"
## [19] "classif.mauc_au1u" "classif.mauc_aunp" "classif.mauc_aunu"
## [22] "classif.mbrier" "classif.mcc" "classif.npv"
## [25] "classif.ppv" "classif.prauc" "classif.precision"
## [28] "classif.recall" "classif.sensitivity" "classif.specificity"
## [31] "classif.tn" "classif.tnr" "classif.tp"
## [34] "classif.tpr" "debug" "oob_error"
## [37] "regr.bias" "regr.ktau" "regr.mae"
## [40] "regr.mape" "regr.maxae" "regr.medae"
## [43] "regr.medse" "regr.mse" "regr.msle"
## [46] "regr.pbias" "regr.rae" "regr.rmse"
## [49] "regr.rmsle" "regr.rrse" "regr.rse"
## [52] "regr.rsq" "regr.sae" "regr.smape"
## [55] "regr.srho" "regr.sse" "selected_features"
## [58] "sim.jaccard" "sim.phi" "time_both"
## [61] "time_predict" "time_train"
prediction$score(measures): returns the model evaluation metrics of the selected learner
3.2 Pipeline
Machine learning workflows can be written as directed “Graphs”/“Pipelines” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. We will most often use the term “Graph” in this manual but it can interchangeably be used with “pipeline” or “workflow”.
Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing. Currently supported features are:
- Data manipulation and preprocessing operations, e.g. PCA, feature filtering, imputation
- Task subsampling for speed and outcome class imbalance handling
- mlr3 Learner operations for prediction and stacking
- Ensemble methods and aggregation of predictions
Additionally, we implement several meta operators that can be used to construct powerful pipelines:
- Simultaneous path branching (data going both ways)
- Alternative path branching (data going one specific way, controlled by hyperparameters)
Using methods from `mlr3tuning`, it is even possible to simultaneously optimize parameters of multiple processing units.
### The Building Blocks: PipeOps
The building blocks of mlr3pipelines are PipeOp-objects (PO). They can be constructed directly using PipeOp<NAME>$new(), but the recommended way is to retrieve them from the mlr_pipeops dictionary:
```{r,echo=TRUE}
library(“mlr3pipelines”)
as.data.table(mlr_pipeops)
```
### Nodes, Edges and Graphs
POs are combined into Graphs.
POs are identified by their $id. Note that the operations all modify the object in-place and return the object itself. Therefore, multiple modifications can be chained.
Connects PipeOps with edges to control data flow during training and prediction. Input is sent to sources (no in-edges), output is read from sinks (no out-edges).
Important methods and slots:
```
Display:print(gr),gr$plot(html = TRUE) Accessing PipeOps: gr$pipeops
Named list of all contained POs.
```
The %>>% operator takes either a PipeOp or a Graph on each of its sides and connects all left-hand outputs to the right-hand inputs. For full control, connect PipeOps explicitly:
```
gr = Graph$new()
gr$add_pipeop(po(“pca”))
gr$add_pipeop(lrn(“classif.rpart”))
gr$add_edge(“pca”, “classif.rpart”)
```
GraphLearner behave like Learner and enable all mlr3 features:
```
grl = GraphLearner$new(gr).
```
See slots $encapsulate for debugging and $model for results after training.
Concatenate POs with %>>% to get linear graph.
### Modeling
The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner.
#### Setting Hyperparameters
Individual POs offer hyperparameters because they contain $param_set slots that can be read and written from $param_set$values (via the paradox package). The parameters get passed down to the Graph, and finally to the GraphLearner . This makes it not only possible to easily change the behavior of a Graph / GraphLearner and try different settings manually, but also to perform tuning using the mlr3tuning package.
For POs: Exactly as in a Learner.
```
enc = po(“encode”)
enc$param_set
enc$param_set$values = list(method=“one-hot”)
po(“encode”, param_vals = list(method=“one-hot”))
```
For Graph / GraphLearner: All HPs are collected in a global ParamSet stored in $param_set. IDs are prefixed with the respective PipeOp’s id.
#### Tuning
Can jointly tune any Pipeline. Usage of AutoTuner is identical.
Details could be seen in below section.
### Non-Linear Graphs
The Graphs seen so far all have a linear structure. Some POs may have multiple input or output channels. These channels make it possible to create non-linear Graphs with alternative paths taken by the data.
Possible types are:
Branching: Splitting of a node into several paths, e.g. useful when comparing multiple feature-selection methods (pca, filters). Only one path will be executed.
Copying: Splitting of a node into several paths, all paths will be executed (sequentially). Parallel execution is not yet supported.
Stacking: Single graphs are stacked onto each other, i.e. the output of one Graph is the input for another. In machine learning this means that the prediction of one Graph is used as input for another Graph.
#### Common functions:
`gunion()` arranges PipeOps or Graphs next to each other in a disjoint graph union.
pipeline_greplicate() creates a new Graph containing n copies of the input (PipeOp or Graph).
PipeOpFeatureUnion aggregates features from all input tasks into a single Task.
#### Branching & Copying
The PipeOpBranch and PipeOpUnbranch POs make it possible to specify multiple alternative paths. Only one path is actually executed, the others are ignored. The active path is determined by a hyperparameter. This concept makes it possible to tune alternative preprocessing paths (or learner models).
They controls the path execution. Only one branch can be active. Which one is controlled by a hyperparameter. Unbranching ends the forking.
Example:
```
gr = ppl(“branch”, list(
pca = po(“pca”), scale = po(“scale”))
)
# set the “pca” path as the active one:
gr$param_set$values$branch.selection = “pca”
```
Tuning the branching selection enables powerful model selection.
3.3 Hyperparameter Tuning
The table shows the terminator methods:
as.data.table(mlr_terminators)
## key label properties unit
## 1: clock_time Clock Time single-crit,multi-crit seconds
## 2: combo Combination single-crit,multi-crit percent
## 3: evals Number of Evaluation single-crit,multi-crit evaluations
## 4: none None single-crit,multi-crit percent
## 5: perf_reached Performance Level single-crit percent
## 6: run_time Run Time single-crit,multi-crit seconds
## 7: stagnation Stagnation single-crit percent
## 8: stagnation_batch Stagnation Batch single-crit percent
The table shows tuner search strategy we can choose from:
as.data.table(mlr_tuners)
## key label
## 1: cmaes Covariance Matrix Adaptation Evolution Strategy
## 2: design_points Design Points
## 3: gensa Generalized Simulated Annealing
## 4: grid_search Grid Search
## 5: irace Iterated Racing
## 6: nloptr Non-linear Optimization
## 7: random_search Random Search
## param_classes
## 1: ParamDbl
## 2: ParamLgl,ParamInt,ParamDbl,ParamFct,ParamUty
## 3: ParamDbl
## 4: ParamLgl,ParamInt,ParamDbl,ParamFct
## 5: ParamDbl,ParamInt,ParamFct,ParamLgl
## 6: ParamDbl
## 7: ParamLgl,ParamInt,ParamDbl,ParamFct
## properties packages
## 1: single-crit mlr3tuning,bbotk,adagio
## 2: dependencies,single-crit,multi-crit mlr3tuning,bbotk
## 3: single-crit mlr3tuning,bbotk,GenSA
## 4: dependencies,single-crit,multi-crit mlr3tuning
## 5: dependencies,single-crit mlr3tuning,bbotk,irace
## 6: single-crit mlr3tuning,bbotk,nloptr
## 7: dependencies,single-crit,multi-crit mlr3tuning,bbotk
The parameter set is combined in mutivariate search space SS: ss = ps(<id> = p_int(lower, upper), <id>= p_dbl(lower, upper), <id> = p_dct(levels), <id> = p_lgl())
The <id> represents identifier, and lower, upper, levels are the bounds.
Or, we can use to_tune() to set SS for each parameter.
To tune by hand, we need to fill define all the arguments in the equation: instance = TuningInstanceSingleCrit$new(task,learner, resampling, measure,terminator, ss) tuner = tnr(<tuner>) We need to use TunningInstanceMultiCrit for multi-criteria tuning.
Then we access the results:
tuner$optimize(instance) as.data.table(instance$archive) learner$param_set$values = instance$result_learner_param_vals
The auto tuner:
auto_tuner(
method = tnr(<tuner search strategy>),
learner = lrn(<learner>, cp = to_tune(lower bound, upperbound, logscale = <TRUE/FALSE>)),
resampling = rsmp(<method>),
measure = msr(<measure>),
term_evals = <#>,
batch_size = <#>
)
3.4 Feature Selection
Here is the auto feature selector:
The table shows fselectors method we can choose from:
as.data.table(mlr_fselectors)
## key label
## 1: design_points Design Points
## 2: exhaustive_search Exhaustive Search
## 3: genetic_search Genetic Search
## 4: random_search Random Search
## 5: rfe Recursive Feature Elimination
## 6: sequential Sequential Search
## 7: shadow_variable_search Shadow Variable Search
## properties packages
## 1: dependencies,single-crit,multi-crit mlr3fselect,bbotk
## 2: single-crit,multi-crit mlr3fselect
## 3: single-crit mlr3fselect
## 4: single-crit,multi-crit mlr3fselect
## 5: single-crit mlr3fselect
## 6: single-crit mlr3fselect
## 7: single-crit mlr3fselect
First, we make feature selection by hand, and the process is similar to hyper parameter tuning, we need to define all the arguments and then we can get the result: instance = FSelectInstanceSingleCrit$new(task, learner, resampling, measure, terminator) fselector = fs(<fs method>, batch_size = <number>) fselector$optimize(instance) instance$result
We can reselect the features we want using the code: task$select(instance$result_feature_set)
Next, we will introduce the auto feature selector that eases the process:
autot=auto_fselector(
method = <fselector>,
learner = <your learner>,
resampling = rsmp(<method>),
measure = msr(<measure>),
term_evals = <#>,
batch_size = <#>) autot$train(task, row_ids) autot$predict(task, row_ids)
We can check the feature selection subset by calling the learner again:
autot$learner