Chapter 3 Cheat Sheets

3.1 Basic Function

  • as_bench_result(x, ) converts object to benchmark result for visualization

  • mlr3 Dictionaries The dictionaries stores all the classes with functions that we can use in this mlr3 library.

    -mlr_tasks

    -mlr_task_generators

    -mlr_learners

    -mlr_measures

    -mlr_resampling

Example usages: The function keys() returns all learners keys prebuilt in mlr3 package. If we install mlr3learners package, we will get an extension version of it. Installing the according libraries and extend the dictionaries.

mlr_learners$keys(pattern = NULL)
##  [1] "classif.cv_glmnet"   "classif.debug"       "classif.featureless"
##  [4] "classif.glmnet"      "classif.kknn"        "classif.lda"        
##  [7] "classif.log_reg"     "classif.multinom"    "classif.naive_bayes"
## [10] "classif.nnet"        "classif.qda"         "classif.ranger"     
## [13] "classif.rpart"       "classif.svm"         "classif.xgboost"    
## [16] "regr.cv_glmnet"      "regr.debug"          "regr.featureless"   
## [19] "regr.glmnet"         "regr.kknn"           "regr.km"            
## [22] "regr.lm"             "regr.nnet"           "regr.ranger"        
## [25] "regr.rpart"          "regr.svm"            "regr.xgboost"

For a brief introduction, we will explain the keywords in these six to get a better understanding:

-classif means it is used to solve classification related problems, regr means it is used to solve regression related problems.

-featureless means that the learner will ignore all the features during train and only consider the response.

-classif.rpart is a LearnerClassif for a classification tree implemented and regr.rpart is a LearnerRegr for a regression tree implemented. These two functions will take in features during training.

-debug learner used for debugging purposes.

The function get() retrieves object by key. It will show all the information about the key.

mlr_learners$get("classif.rpart")
## <LearnerClassifRpart:classif.rpart>: Classification Tree
## * Model: -
## * Parameters: xval=0
## * Packages: mlr3, rpart
## * Predict Types:  [response], prob
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, multiclass, selected_features,
##   twoclass, weights

The function makes dictionary to data.table form.

head(as.data.table(mlr_tasks))
##               key                   label task_type nrow ncol properties lgl
## 1: boston_housing   Boston Housing Prices      regr  506   19              0
## 2:  breast_cancer Wisconsin Breast Cancer   classif  683   10   twoclass   0
## 3:  german_credit           German Credit   classif 1000   21   twoclass   0
## 4:           iris            Iris Flowers   classif  150    5 multiclass   0
## 5:         mtcars            Motor Trends      regr   32   11              0
## 6:       penguins         Palmer Penguins   classif  344    8 multiclass   0
##    int dbl chr fct ord pxc
## 1:   3  13   0   2   0   0
## 2:   0   0   0   0   9   0
## 3:   3   0   0  14   3   0
## 4:   0   4   0   0   0   0
## 5:   0  10   0   0   0   0
## 6:   3   2   0   2   0   0
  • Tasks

Target determines the machine learning Task. We can create a classification task:

task1 = as_task_classif(x = iris, target = "Species")
task1
## <TaskClassif:iris> (150 x 5)
## * Target: Species
## * Properties: multiclass
## * Features (4):
##   - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width

We can create a regression task:

age <- c(33, 55, 25)
salary <- c(20000, 50000, 15000)
df <- data.frame(age, salary)
task2 = as_task_regr(x= df, target = "salary")
task2
## <TaskRegr:df> (3 x 2)
## * Target: salary
## * Properties: -
## * Features (1):
##   - dbl (1): age

We can also use the example tasks in mlr_tasks by calling tsk(task_name):

task3 = tsk("zoo")
task3
## <TaskClassif:zoo> (101 x 17): Zoo Animals
## * Target: type
## * Properties: multiclass
## * Features (16):
##   - lgl (15): airborne, aquatic, backbone, breathes, catsize, domestic,
##     eggs, feathers, fins, hair, milk, predator, tail, toothed, venomous
##   - int (1): legs

We can perform some functions on the task: task$positive = “” sets positive class for binary classification

#return number of rows
task1$nrow
## [1] 150
#return number of columns
task1$ncol
## [1] 5
#subset the task by selecting features
task1$select("Sepal.Length")

task$cbind(data) adds columns

task$rbind(data) adds rows

task$feature_names return feature names in the task

  • Learner

To use a learner, we can call the method using:

learner = mlr_learners$get(method) or

learner = lrn(method)

Here is an example:

learner = lrn("regr.rpart")
learner
## <LearnerRegrRpart:regr.rpart>: Regression Tree
## * Model: -
## * Parameters: xval=0
## * Packages: mlr3, rpart
## * Predict Types:  [response]
## * Feature Types: logical, integer, numeric, factor, ordered
## * Properties: importance, missings, selected_features, weights
  • Train

We train our task using the learner we chose:

learner$train(task, row_ids)

learner$model: the model is stored and viewed

Split on test/train:

train_set = sample(task$nrow, (percentage) * my_task$nrow)

test_set = setdiff(seq_len(task$nrow), train_set)

  • Predict

These two methods will predict on the select data: prediction = learner$predict(task, row_ids)

prediction = learner$predict_newdata(data)

  • Model Evaluation

Here are the model evaluation metrics in the mlr_measures library:

mlr_measures$keys(pattern = NULL)
##  [1] "aic"                 "bic"                 "classif.acc"        
##  [4] "classif.auc"         "classif.bacc"        "classif.bbrier"     
##  [7] "classif.ce"          "classif.costs"       "classif.dor"        
## [10] "classif.fbeta"       "classif.fdr"         "classif.fn"         
## [13] "classif.fnr"         "classif.fomr"        "classif.fp"         
## [16] "classif.fpr"         "classif.logloss"     "classif.mauc_au1p"  
## [19] "classif.mauc_au1u"   "classif.mauc_aunp"   "classif.mauc_aunu"  
## [22] "classif.mbrier"      "classif.mcc"         "classif.npv"        
## [25] "classif.ppv"         "classif.prauc"       "classif.precision"  
## [28] "classif.recall"      "classif.sensitivity" "classif.specificity"
## [31] "classif.tn"          "classif.tnr"         "classif.tp"         
## [34] "classif.tpr"         "debug"               "oob_error"          
## [37] "regr.bias"           "regr.ktau"           "regr.mae"           
## [40] "regr.mape"           "regr.maxae"          "regr.medae"         
## [43] "regr.medse"          "regr.mse"            "regr.msle"          
## [46] "regr.pbias"          "regr.rae"            "regr.rmse"          
## [49] "regr.rmsle"          "regr.rrse"           "regr.rse"           
## [52] "regr.rsq"            "regr.sae"            "regr.smape"         
## [55] "regr.srho"           "regr.sse"            "selected_features"  
## [58] "sim.jaccard"         "sim.phi"             "time_both"          
## [61] "time_predict"        "time_train"

prediction$score(measures): returns the model evaluation metrics of the selected learner

3.2 Pipeline

Machine learning workflows can be written as directed “Graphs”/“Pipelines” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. We will most often use the term “Graph” in this manual but it can interchangeably be used with “pipeline” or “workflow”.

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing. Currently supported features are:

- Data manipulation and preprocessing operations, e.g. PCA, feature filtering, imputation

- Task subsampling for speed and outcome class imbalance handling

- mlr3 Learner operations for prediction and stacking

- Ensemble methods and aggregation of predictions

Additionally, we implement several meta operators that can be used to construct powerful pipelines:

- Simultaneous path branching (data going both ways)

- Alternative path branching (data going one specific way, controlled by hyperparameters)

Using methods from `mlr3tuning`, it is even possible to simultaneously optimize parameters of multiple processing units.

### The Building Blocks: PipeOps

The building blocks of mlr3pipelines are PipeOp-objects (PO). They can be constructed directly using PipeOp<NAME>$new(), but the recommended way is to retrieve them from the mlr_pipeops dictionary:

```{r,echo=TRUE}

library(“mlr3pipelines”)

as.data.table(mlr_pipeops)

```

### Nodes, Edges and Graphs

POs are combined into Graphs.

POs are identified by their $id. Note that the operations all modify the object in-place and return the object itself. Therefore, multiple modifications can be chained.

Connects PipeOps with edges to control data flow during training and prediction. Input is sent to sources (no in-edges), output is read from sinks (no out-edges).

Important methods and slots:

```

Display:print(gr),gr$plot(html = TRUE) Accessing PipeOps: gr$pipeops

Named list of all contained POs.

```

The %>>% operator takes either a PipeOp or a Graph on each of its sides and connects all left-hand outputs to the right-hand inputs. For full control, connect PipeOps explicitly:

```

gr = Graph$new()

gr$add_pipeop(po(“pca”))

gr$add_pipeop(lrn(“classif.rpart”))

gr$add_edge(“pca”, “classif.rpart”)

```

GraphLearner behave like Learner and enable all mlr3 features:

```

grl = GraphLearner$new(gr).

```

See slots $encapsulate for debugging and $model for results after training.

Concatenate POs with %>>% to get linear graph.

### Modeling

The main purpose of a Graph is to build combined preprocessing and model fitting pipelines that can be used as mlr3 Learner.

#### Setting Hyperparameters

Individual POs offer hyperparameters because they contain $param_set slots that can be read and written from $param_set$values (via the paradox package). The parameters get passed down to the Graph, and finally to the GraphLearner . This makes it not only possible to easily change the behavior of a Graph / GraphLearner and try different settings manually, but also to perform tuning using the mlr3tuning package.

For POs: Exactly as in a Learner.

```

enc = po(“encode”)

enc$param_set

enc$param_set$values = list(method=“one-hot”)

po(“encode”, param_vals = list(method=“one-hot”))

```

For Graph / GraphLearner: All HPs are collected in a global ParamSet stored in $param_set. IDs are prefixed with the respective PipeOp’s id.

#### Tuning

Can jointly tune any Pipeline. Usage of AutoTuner is identical.

Details could be seen in below section.

### Non-Linear Graphs

The Graphs seen so far all have a linear structure. Some POs may have multiple input or output channels. These channels make it possible to create non-linear Graphs with alternative paths taken by the data.

Possible types are:

Branching: Splitting of a node into several paths, e.g. useful when comparing multiple feature-selection methods (pca, filters). Only one path will be executed.

Copying: Splitting of a node into several paths, all paths will be executed (sequentially). Parallel execution is not yet supported.

Stacking: Single graphs are stacked onto each other, i.e. the output of one Graph is the input for another. In machine learning this means that the prediction of one Graph is used as input for another Graph.

#### Common functions:

`gunion()` arranges PipeOps or Graphs next to each other in a disjoint graph union.

pipeline_greplicate() creates a new Graph containing n copies of the input (PipeOp or Graph).

PipeOpFeatureUnion aggregates features from all input tasks into a single Task.

#### Branching & Copying

The PipeOpBranch and PipeOpUnbranch POs make it possible to specify multiple alternative paths. Only one path is actually executed, the others are ignored. The active path is determined by a hyperparameter. This concept makes it possible to tune alternative preprocessing paths (or learner models).

They controls the path execution. Only one branch can be active. Which one is controlled by a hyperparameter. Unbranching ends the forking.

Example:

```

gr = ppl(“branch”, list(

pca = po(“pca”), scale = po(“scale”))

)

# set the “pca” path as the active one:

gr$param_set$values$branch.selection = “pca”

```

Tuning the branching selection enables powerful model selection.

3.3 Hyperparameter Tuning

The table shows the terminator methods:

as.data.table(mlr_terminators)
##                 key                label             properties        unit
## 1:       clock_time           Clock Time single-crit,multi-crit     seconds
## 2:            combo          Combination single-crit,multi-crit     percent
## 3:            evals Number of Evaluation single-crit,multi-crit evaluations
## 4:             none                 None single-crit,multi-crit     percent
## 5:     perf_reached    Performance Level            single-crit     percent
## 6:         run_time             Run Time single-crit,multi-crit     seconds
## 7:       stagnation           Stagnation            single-crit     percent
## 8: stagnation_batch     Stagnation Batch            single-crit     percent

The table shows tuner search strategy we can choose from:

as.data.table(mlr_tuners) 
##              key                                           label
## 1:         cmaes Covariance Matrix Adaptation Evolution Strategy
## 2: design_points                                   Design Points
## 3:         gensa                 Generalized Simulated Annealing
## 4:   grid_search                                     Grid Search
## 5:         irace                                 Iterated Racing
## 6:        nloptr                         Non-linear Optimization
## 7: random_search                                   Random Search
##                                   param_classes
## 1:                                     ParamDbl
## 2: ParamLgl,ParamInt,ParamDbl,ParamFct,ParamUty
## 3:                                     ParamDbl
## 4:          ParamLgl,ParamInt,ParamDbl,ParamFct
## 5:          ParamDbl,ParamInt,ParamFct,ParamLgl
## 6:                                     ParamDbl
## 7:          ParamLgl,ParamInt,ParamDbl,ParamFct
##                             properties                packages
## 1:                         single-crit mlr3tuning,bbotk,adagio
## 2: dependencies,single-crit,multi-crit        mlr3tuning,bbotk
## 3:                         single-crit  mlr3tuning,bbotk,GenSA
## 4: dependencies,single-crit,multi-crit              mlr3tuning
## 5:            dependencies,single-crit  mlr3tuning,bbotk,irace
## 6:                         single-crit mlr3tuning,bbotk,nloptr
## 7: dependencies,single-crit,multi-crit        mlr3tuning,bbotk

The parameter set is combined in mutivariate search space SS: ss = ps(<id> = p_int(lower, upper), <id>= p_dbl(lower, upper), <id> = p_dct(levels), <id> = p_lgl())

The <id> represents identifier, and lower, upper, levels are the bounds.

Or, we can use to_tune() to set SS for each parameter.

To tune by hand, we need to fill define all the arguments in the equation: instance = TuningInstanceSingleCrit$new(task,learner, resampling, measure,terminator, ss) tuner = tnr(<tuner>) We need to use TunningInstanceMultiCrit for multi-criteria tuning.

Then we access the results:

tuner$optimize(instance) as.data.table(instance$archive) learner$param_set$values = instance$result_learner_param_vals

The auto tuner:

auto_tuner(

method = tnr(<tuner search strategy>),

learner = lrn(<learner>, cp = to_tune(lower bound, upperbound, logscale = <TRUE/FALSE>)),

resampling = rsmp(<method>),

measure = msr(<measure>),

term_evals = <#>,

batch_size = <#>

)

3.4 Feature Selection

Here is the auto feature selector:

The table shows fselectors method we can choose from:

as.data.table(mlr_fselectors) 
##                       key                         label
## 1:          design_points                 Design Points
## 2:      exhaustive_search             Exhaustive Search
## 3:         genetic_search                Genetic Search
## 4:          random_search                 Random Search
## 5:                    rfe Recursive Feature Elimination
## 6:             sequential             Sequential Search
## 7: shadow_variable_search        Shadow Variable Search
##                             properties          packages
## 1: dependencies,single-crit,multi-crit mlr3fselect,bbotk
## 2:              single-crit,multi-crit       mlr3fselect
## 3:                         single-crit       mlr3fselect
## 4:              single-crit,multi-crit       mlr3fselect
## 5:                         single-crit       mlr3fselect
## 6:                         single-crit       mlr3fselect
## 7:                         single-crit       mlr3fselect

First, we make feature selection by hand, and the process is similar to hyper parameter tuning, we need to define all the arguments and then we can get the result: instance = FSelectInstanceSingleCrit$new(task, learner, resampling, measure, terminator) fselector = fs(<fs method>, batch_size = <number>) fselector$optimize(instance) instance$result

We can reselect the features we want using the code: task$select(instance$result_feature_set)

Next, we will introduce the auto feature selector that eases the process:

autot=auto_fselector(

method = <fselector>,

learner = <your learner>,

resampling = rsmp(<method>),

measure = msr(<measure>),

term_evals = <#>,

batch_size = <#>) autot$train(task, row_ids) autot$predict(task, row_ids)

We can check the feature selection subset by calling the learner again:

autot$learner