Use cross-validation to find the optimal nrounds
for an Mixgb
imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds
.
Usage
mixgb_cv(
data,
nfold = 5,
nrounds = 100,
early_stopping_rounds = 10,
response = NULL,
select_features = NULL,
xgb.params = list(),
stringsAsFactors = FALSE,
verbose = TRUE,
...
)
Arguments
- data
A data.frame or a data.table with missing values.
- nfold
The number of subsamples which are randomly partitioned and of equal size. Default: 5
- nrounds
The max number of iterations in XGBoost training. Default: 100
- early_stopping_rounds
An integer value
k
. Training will stop if the validation performance has not improved fork
rounds.- response
The name or the column index of a response variable. Default:
NULL
(Randomly select an incomplete variable).- select_features
The names or the indices of selected features. Default:
NULL
(Select all the other variables in the dataset).- xgb.params
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
- stringsAsFactors
A logical value indicating whether all character vectors in the dataset should be converted to factors.
- verbose
A logical value. Whether to print out cross-validation results during the process.
- ...
Extra arguments to be passed to XGBoost.
Examples
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
#> [1] train-rmse:48.076607+0.057642 test-rmse:48.104235+0.281055
#> Multiple eval metrics are present. Will use test_rmse for early stopping.
#> Will train until test_rmse hasn't improved in 10 rounds.
#>
#> [2] train-rmse:33.883465+0.067183 test-rmse:33.881648+0.256251
#> [3] train-rmse:23.927855+0.041960 test-rmse:23.947425+0.284434
#> [4] train-rmse:16.920347+0.052635 test-rmse:16.937598+0.294499
#> [5] train-rmse:12.043112+0.057752 test-rmse:12.089042+0.293432
#> [6] train-rmse:8.664317+0.063218 test-rmse:8.727653+0.320710
#> [7] train-rmse:6.348849+0.045646 test-rmse:6.422782+0.382683
#> [8] train-rmse:4.784457+0.058809 test-rmse:4.890129+0.439278
#> [9] train-rmse:3.764482+0.070585 test-rmse:3.919752+0.478493
#> [10] train-rmse:3.090149+0.089520 test-rmse:3.283148+0.508359
#> [11] train-rmse:2.682340+0.104265 test-rmse:2.930222+0.537279
#> [12] train-rmse:2.415356+0.103885 test-rmse:2.713623+0.562085
#> [13] train-rmse:2.278572+0.114202 test-rmse:2.619390+0.555772
#> [14] train-rmse:2.190517+0.134924 test-rmse:2.579914+0.536981
#> [15] train-rmse:2.112429+0.119367 test-rmse:2.538028+0.539973
#> [16] train-rmse:2.064155+0.111288 test-rmse:2.526319+0.538636
#> [17] train-rmse:2.024054+0.123361 test-rmse:2.510380+0.546990
#> [18] train-rmse:1.993375+0.111570 test-rmse:2.502675+0.546324
#> [19] train-rmse:1.963972+0.100305 test-rmse:2.507824+0.543798
#> [20] train-rmse:1.931319+0.097409 test-rmse:2.511840+0.537009
#> [21] train-rmse:1.905979+0.091477 test-rmse:2.524172+0.528898
#> [22] train-rmse:1.869336+0.091676 test-rmse:2.517119+0.523485
#> [23] train-rmse:1.840289+0.096952 test-rmse:2.520033+0.512642
#> [24] train-rmse:1.814250+0.085057 test-rmse:2.516780+0.511387
#> [25] train-rmse:1.799222+0.089272 test-rmse:2.522677+0.517764
#> [26] train-rmse:1.778484+0.084601 test-rmse:2.522758+0.509422
#> [27] train-rmse:1.752968+0.088190 test-rmse:2.536694+0.505922
#> [28] train-rmse:1.735900+0.098270 test-rmse:2.529148+0.507148
#> Stopping. Best iteration:
#> [18] train-rmse:1.993375+0.111570 test-rmse:2.502675+0.546324
#>
cv.results$best.nrounds
#> [1] 18
imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params, nrounds = cv.results$best.nrounds)