This function is used to generate multiply-imputed datasets using XGBoost, subsampling and predictive mean matching (PMM).
Usage
mixgb(
data,
m = 5,
maxit = 1,
ordinalAsInteger = FALSE,
pmm.type = NULL,
pmm.k = 5,
pmm.link = "prob",
initial.num = "normal",
initial.int = "mode",
initial.fac = "mode",
save.models = FALSE,
save.vars = NULL,
save.models.folder = NULL,
verbose = F,
xgb.params = list(),
nrounds = 100,
early_stopping_rounds = NULL,
print_every_n = 10L,
xgboost_verbose = 0,
...
)Arguments
- data
A data.frame or data.table with missing values
- m
The number of imputed datasets. Default: 5
- maxit
The number of imputation iterations. Default: 1
- ordinalAsInteger
Whether to convert ordinal factors to integers. By default,
ordinalAsInteger = FALSE. SettingordinalAsInteger = TRUEmay speed up the imputation process for large datasets.- pmm.type
The type of predictive mean matching (PMM). Possible values:
NULL(default): Imputations without PMM;0: Imputations with PMM type 0;1: Imputations with PMM type 1;2: Imputations with PMM type 2;"auto": Imputations with PMM type 2 for numeric/integer variables; imputations without PMM for categorical variables.
- pmm.k
The number of donors for predictive mean matching. Default: 5
- pmm.link
The link for predictive mean matching in binary variables
"prob"(default): use probabilities;"logit": use logit values.
- initial.num
Initial imputation method for numeric type data:
"normal"(default);"mean";"median";"mode";"sample".
- initial.int
Initial imputation method for integer type data:
"mode"(default);"sample".
- initial.fac
Initial imputation method for factor type data:
"mode"(default);"sample".
- save.models
Whether to save imputation models for imputing new data later on. Default:
FALSE- save.vars
For the purpose of imputing new data, the imputation models for response variables specified in
save.varswill be saved. The values insave.varscan be a vector of names or indices. By default, only the imputation models for variables with missing values in the original data will be saved (save.vars = NULL). To save imputation models for all variables, users can specifysave.vars = colnames(data).- save.models.folder
Users can specify a directory to save all imputation models. Models will be saved in JSON format by internally calling
xgb.save(), which is recommended by XGBoost.- verbose
Verbose setting for mixgb. If
TRUE, will print out the progress of imputation. Default:FALSE.- xgb.params
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
- nrounds
The maximum number of boosting iterations for XGBoost. Default: 100
- early_stopping_rounds
An integer value
k. XGBoost training will stop if the validation performance has not improved forkrounds. Default: 10.- print_every_n
Print XGBoost evaluation information at every nth iteration if
xgboost_verbose > 0.- xgboost_verbose
Verbose setting for XGBoost training: 0 (silent), 1 (print information) and 2 (print additional information). Default: 0
- ...
Extra arguments to be passed to XGBoost
Value
If save.models = FALSE, this function will return a list of m imputed datasets. If save.models = TRUE, it will return an object with imputed datasets, saved models and parameters.
Examples
# obtain m multiply datasets without saving models
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
mixgb.data <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10)
# obtain m multiply imputed datasets and save models for imputing new data later on
mixgb.obj <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10,
save.models = TRUE, save.models.folder = tempdir())