Title: | Proper Scoring Rules for Missing Value Imputation |
---|---|
Description: | Implementation of a KL-based scoring rule to assess the quality of different missing value imputations in the broad sense as introduced in Michel et al. (2021) <arXiv:2106.03742>. |
Authors: | Loris Michel, Meta-Lina Spohn, Jeffrey Naef |
Maintainer: | Loris Michel <[email protected]> |
License: | GPL-3 |
Version: | 1.1.0 |
Built: | 2025-03-12 04:30:51 UTC |
Source: | https://github.com/missvalteam/iscores |
Balancing of Classes
class.balancing(X.proj.complete, Y.proj, drawA, Xhat, ids.with.missing, vars)
class.balancing(X.proj.complete, Y.proj, drawA, Xhat, ids.with.missing, vars)
X.proj.complete |
matrix with complete projected observations. |
Y.proj |
matrix with projected imputed observations. |
drawA |
vector of indices corresponding to current missingness pattern. |
Xhat |
matrix of full imputed observations. |
ids.with.missing |
vector of indices of observations with missing values. |
vars |
vectors of variables in projection. |
a list of new X.proj.complete and Y.proj.
Combining projection forests
combine2Forests(mod1, mod2)
combine2Forests(mod1, mod2)
mod1 |
first forest |
mod2 |
second forest |
a new forest combining the first and the second forest
Combining a list of forest
combineForests(list.rf)
combineForests(list.rf)
list.rf |
a list of forests |
a forest combination of the forests stored in list.rf
compute the density ratio score
compute_drScore(object, Z = Z, num.trees.per.proj, num.proj)
compute_drScore(object, Z = Z, num.trees.per.proj, num.proj)
object |
a crf object. |
Z |
a matrix of candidate points. |
num.trees.per.proj |
an integer, the number of trees per projection. |
num.proj |
an integer specifying the number of projections. |
a numeric value, the DR I-Score.
Computation of the density ratio score
densityRatioScore( X, Xhat, x = NULL, num.proj = 10, num.trees.per.proj = 1, projection.function = NULL, min.node.size = 1, normal.proj = T )
densityRatioScore( X, Xhat, x = NULL, num.proj = 10, num.trees.per.proj = 1, projection.function = NULL, min.node.size = 1, normal.proj = T )
X |
a matrix of the observed data containing missing values. |
Xhat |
a matrix of imputations having same size as X. |
x |
pattern of missing values. |
num.proj |
an integer specifying the number of projections. |
num.trees.per.proj |
an integer, the number of trees per projection. |
projection.function |
a function providing the user-specific projections. |
min.node.size |
the minimum number of observations in a leaf of a tree. |
normal.proj |
a boolean, if TRUE, sample from the NA of the pattern and additionally from the non NA. If FALSE, sample only from the NA of the pattern. |
a fitted random forest based on random projections
doevaluation: compute the imputation KL-based scoring rules
doevaluation( imputations, methods, X.NA, m, num.proj, num.trees.per.proj, min.node.size, n.cores = 1, projection.function = NULL )
doevaluation( imputations, methods, X.NA, m, num.proj, num.trees.per.proj, min.node.size, n.cores = 1, projection.function = NULL )
imputations |
a list of list of imputations matrices containing no missing values of the same size as X.NA |
methods |
a vector of characters indicating which methods are considered for imputations. It should have the same length as the list imputations. |
X.NA |
a matrix containing missing values, the data to impute. |
m |
the number of multiple imputation to consider, defaulting to the number of provided multiple imputations. |
num.proj |
an integer specifying the number of projections to consider for the score. |
num.trees.per.proj |
an integer, the number of trees per projection. |
min.node.size |
the minimum number of nodes in a tree. |
n.cores |
an integer, the number of cores to use. |
projection.function |
a function providing the user-specific projections. |
a vector made of the scores for each imputation method.
Iscores: compute the imputation KL-based scoring rules
Iscores( imputations, methods, X.NA, m = length(imputations[[1]]), num.proj = 100, num.trees.per.proj = 5, min.node.size = 10, n.cores = 1, projection.function = NULL, rescale = TRUE )
Iscores( imputations, methods, X.NA, m = length(imputations[[1]]), num.proj = 100, num.trees.per.proj = 5, min.node.size = 10, n.cores = 1, projection.function = NULL, rescale = TRUE )
imputations |
a list of list of imputations matrices containing no missing values of the same size as X.NA |
methods |
a vector of characters indicating which methods are considered for imputations. It should have the same length as the list imputations. |
X.NA |
a matrix containing missing values, the data to impute. |
m |
the number of multiple imputation to consider, defaulting to the number of provided multiple imputations. |
num.proj |
an integer specifying the number of projections to consider for the score. |
num.trees.per.proj |
an integer, the number of trees per projection. |
min.node.size |
the minimum number of nodes in a tree. |
n.cores |
an integer, the number of cores to use. |
projection.function |
a function providing the user-specific projections. |
rescale |
a boolean, TRUE if the scores should be rescaled such that the max score is 0. |
a vector made of the scores for each imputation method.
n <- 100 X <- cbind(rnorm(n),rnorm(n)) X.NA <- X X.NA[,1] <- ifelse(stats::runif(n)<=0.2, NA, X[,1]) imputations <- list() imputations[[1]] <- lapply(1:5, function(i) { X.loc <- X.NA X.loc[is.na(X.NA[,1]),1] <- mean(X.NA[,1],na.rm=TRUE) return(X.loc) }) imputations[[2]] <- lapply(1:5, function(i) { X.loc <- X.NA X.loc[is.na(X.NA[,1]),1] <- sample(X.NA[!is.na(X.NA[,1]),1], size = sum(is.na(X.NA[,1])), replace = TRUE) return(X.loc) }) methods <- c("mean","sample") Iscores(imputations, methods, X.NA, num.proj=5 )
n <- 100 X <- cbind(rnorm(n),rnorm(n)) X.NA <- X X.NA[,1] <- ifelse(stats::runif(n)<=0.2, NA, X[,1]) imputations <- list() imputations[[1]] <- lapply(1:5, function(i) { X.loc <- X.NA X.loc[is.na(X.NA[,1]),1] <- mean(X.NA[,1],na.rm=TRUE) return(X.loc) }) imputations[[2]] <- lapply(1:5, function(i) { X.loc <- X.NA X.loc[is.na(X.NA[,1]),1] <- sample(X.NA[!is.na(X.NA[,1]),1], size = sum(is.na(X.NA[,1])), replace = TRUE) return(X.loc) }) methods <- c("mean","sample") Iscores(imputations, methods, X.NA, num.proj=5 )
Sampling of Projections
sample.vars.proj(ids.x.na, X, projection.function = NULL, normal.proj = T)
sample.vars.proj(ids.x.na, X, projection.function = NULL, normal.proj = T)
ids.x.na |
a vector of indices corresponding to NA in the given missingness pattern. |
X |
a matrix of the observed data containing missing values. |
projection.function |
a function providing the user-specific projections. |
normal.proj |
a boolean, if TRUE, sample from the NA of the pattern and additionally from the non NA. If FALSE, sample only from the NA of the pattern. |
a vector of variables corresponding to the projection.
Truncation of probability
truncProb(p)
truncProb(p)
p |
a numeric value between 0 and 1 to be truncated |
a numeric value, the truncated probability.