Use cross-validation to find the optimal nrounds

Use cross-validation to find the optimal nrounds for an Mixgb imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds.

Usage

mixgb_cv(
  data,
  nfold = 5,
  nrounds = 100,
  early_stopping_rounds = 10,
  response = NULL,
  select_features = NULL,
  xgb.params = list(),
  stringsAsFactors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data: A data.frame or a data.table with missing values.
nfold: The number of subsamples which are randomly partitioned and of equal size. Default: 5
nrounds: The max number of iterations in XGBoost training. Default: 100
early_stopping_rounds: An integer value k. Training will stop if the validation performance has not improved for k rounds.
response: The name or the column index of a response variable. Default: NULL (Randomly select an incomplete variable).
select_features: The names or the indices of selected features. Default: NULL (Select all the other variables in the dataset).
xgb.params: A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
stringsAsFactors: A logical value indicating whether all character vectors in the dataset should be converted to factors.
verbose: A logical value. Whether to print out cross-validation results during the process.
...: Extra arguments to be passed to XGBoost.

Value

A list of the optimal nrounds, evaluation.log and the chosen response.

Examples

params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
#> [1]	train-rmse:48.076607+0.057642	test-rmse:48.104235+0.281055 
#> Multiple eval metrics are present. Will use test_rmse for early stopping.
#> Will train until test_rmse hasn't improved in 10 rounds.
#> 
#> [2]	train-rmse:33.883465+0.067183	test-rmse:33.881648+0.256251 
#> [3]	train-rmse:23.927855+0.041960	test-rmse:23.947425+0.284434 
#> [4]	train-rmse:16.920347+0.052635	test-rmse:16.937598+0.294499 
#> [5]	train-rmse:12.043112+0.057752	test-rmse:12.089042+0.293432 
#> [6]	train-rmse:8.664317+0.063218	test-rmse:8.727653+0.320710 
#> [7]	train-rmse:6.348849+0.045646	test-rmse:6.422782+0.382683 
#> [8]	train-rmse:4.784457+0.058809	test-rmse:4.890129+0.439278 
#> [9]	train-rmse:3.764482+0.070585	test-rmse:3.919752+0.478493 
#> [10]	train-rmse:3.090149+0.089520	test-rmse:3.283148+0.508359 
#> [11]	train-rmse:2.682340+0.104265	test-rmse:2.930222+0.537279 
#> [12]	train-rmse:2.415356+0.103885	test-rmse:2.713623+0.562085 
#> [13]	train-rmse:2.278572+0.114202	test-rmse:2.619390+0.555772 
#> [14]	train-rmse:2.190517+0.134924	test-rmse:2.579914+0.536981 
#> [15]	train-rmse:2.112429+0.119367	test-rmse:2.538028+0.539973 
#> [16]	train-rmse:2.064155+0.111288	test-rmse:2.526319+0.538636 
#> [17]	train-rmse:2.024054+0.123361	test-rmse:2.510380+0.546990 
#> [18]	train-rmse:1.993375+0.111570	test-rmse:2.502675+0.546324 
#> [19]	train-rmse:1.963972+0.100305	test-rmse:2.507824+0.543798 
#> [20]	train-rmse:1.931319+0.097409	test-rmse:2.511840+0.537009 
#> [21]	train-rmse:1.905979+0.091477	test-rmse:2.524172+0.528898 
#> [22]	train-rmse:1.869336+0.091676	test-rmse:2.517119+0.523485 
#> [23]	train-rmse:1.840289+0.096952	test-rmse:2.520033+0.512642 
#> [24]	train-rmse:1.814250+0.085057	test-rmse:2.516780+0.511387 
#> [25]	train-rmse:1.799222+0.089272	test-rmse:2.522677+0.517764 
#> [26]	train-rmse:1.778484+0.084601	test-rmse:2.522758+0.509422 
#> [27]	train-rmse:1.752968+0.088190	test-rmse:2.536694+0.505922 
#> [28]	train-rmse:1.735900+0.098270	test-rmse:2.529148+0.507148 
#> Stopping. Best iteration:
#> [18]	train-rmse:1.993375+0.111570	test-rmse:2.502675+0.546324
#> 
cv.results$best.nrounds
#> [1] 18

imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params, nrounds = cv.results$best.nrounds)