Use cross-validation to find the optimal nrounds for an Mixgb imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds.
Usage
mixgb_cv(
data,
nfold = 5,
nrounds = 100,
early_stopping_rounds = 10,
response = NULL,
select_features = NULL,
xgb.params = list(),
stringsAsFactors = FALSE,
verbose = TRUE,
...
)Arguments
- data
A data.frame or a data.table with missing values.
- nfold
The number of subsamples which are randomly partitioned and of equal size. Default: 5
- nrounds
The max number of iterations in XGBoost training. Default: 100
- early_stopping_rounds
An integer value
k. Training will stop if the validation performance has not improved forkrounds.- response
The name or the column index of a response variable. Default:
NULL(Randomly select an incomplete variable).- select_features
The names or the indices of selected features. Default:
NULL(Select all the other variables in the dataset).- xgb.params
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
- stringsAsFactors
A logical value indicating whether all character vectors in the dataset should be converted to factors.
- verbose
A logical value. Whether to print out cross-validation results during the process.
- ...
Extra arguments to be passed to XGBoost.
Examples
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
#> Multiple eval metrics are present. Will use test_rmse for early stopping.
#> Will train until test_rmse hasn't improved in 10 rounds.
#>
#> [1] train-rmse:1.325541±0.054794 test-rmse:1.328004±0.200554
#> [2] train-rmse:1.120618±0.057377 test-rmse:1.120987±0.228403
#> [3] train-rmse:0.977183±0.060819 test-rmse:1.008580±0.243726
#> [4] train-rmse:0.890541±0.060028 test-rmse:0.943404±0.259013
#> [5] train-rmse:0.838193±0.062854 test-rmse:0.900796±0.266055
#> [6] train-rmse:0.790665±0.062452 test-rmse:0.871239±0.274423
#> [7] train-rmse:0.759332±0.064733 test-rmse:0.857790±0.279167
#> [8] train-rmse:0.741639±0.063543 test-rmse:0.849731±0.279353
#> [9] train-rmse:0.730104±0.063602 test-rmse:0.850516±0.277500
#> [10] train-rmse:0.702745±0.048944 test-rmse:0.858593±0.274182
#> [11] train-rmse:0.691603±0.050012 test-rmse:0.860244±0.273899
#> [12] train-rmse:0.687875±0.048874 test-rmse:0.859973±0.274179
#> [13] train-rmse:0.681551±0.050469 test-rmse:0.863008±0.274802
#> [14] train-rmse:0.664586±0.041199 test-rmse:0.863226±0.276052
#> [15] train-rmse:0.660445±0.041594 test-rmse:0.862346±0.276682
#> [16] train-rmse:0.656897±0.043685 test-rmse:0.862776±0.276146
#> [17] train-rmse:0.649109±0.040902 test-rmse:0.861455±0.272649
#> Stopping. Best iteration:
#> [18] train-rmse:0.638577±0.033485 test-rmse:0.873910±0.267153
#>
#> [18] train-rmse:0.638577±0.033485 test-rmse:0.873910±0.267153
cv.results$best.nrounds
#> [1] 8
imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params,
nrounds = cv.results$best.nrounds)
