Title: | Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models |
---|---|
Description: | The c060 package provides additional functions to perform stability selection, model validation and parameter tuning for glmnet models. |
Authors: | Martin Sill, Thomas Hielscher, Manuela Zucknick, Natalia Becker. |
Maintainer: | Frederic Bertrand <[email protected]> |
License: | GPL-2 |
Version: | 0.3-0 |
Built: | 2025-01-31 03:53:41 UTC |
Source: | https://github.com/fbertran/c060 |
Evaluate the area under the ROC curve for a fitted model on new data. To be used as argument aggregation.fun
in peperr
call.
aggregation.auc(full.data=NULL, response, x, model, cplx=NULL, type=c("apparent", "noinf"), fullsample.attr = NULL, ...)
aggregation.auc(full.data=NULL, response, x, model, cplx=NULL, type=c("apparent", "noinf"), fullsample.attr = NULL, ...)
full.data |
passed from |
response |
vector of binary response. |
x |
|
model |
model fitted as returned by a |
cplx |
passed from |
type |
character. |
fullsample.attr |
passed from |
... |
additional arguments, passed to |
Area under the ROC curve is calculated based on internal glmnet:::auc
function from package glmnet
.
Scalar, indicating the area under the ROC curve.
Thomas Hielscher \ [email protected]
## Not run: # binomial model - classification library(c060) library(gridExtra) library(ggplot2) set.seed(0815) x <- matrix(rnorm(100*20),100,20) y <- sample(0:1,100,replace=TRUE) peperr_obj <- peperr(response=y, x=x, fit.fun=fit.glmnet, args.fit=list(family="binomial"), complexity=complexity.glmnet, args.complexity=list(nfolds=10, family="binomial"), trace=F, RNG="fixed",seed=0815, # aggregation.fun=c060:::aggregation.misclass, # aggregation.fun=c060:::aggregation.brier, aggregation.fun=c060:::aggregation.auc, indices=resample.indices(n=nrow(x), sample.n = 100, method = "sub632")) tmp <- data.frame(grp="",error=unlist(peperr_obj$sample.error)) errs <- data.frame(error=c(perr(peperr_obj,"resample"), perr(peperr_obj,"632p"),perr(peperr_obj,"app"), perr(peperr_obj,"nullmodel")), col = c("red","blue","green","brown"), row.names=c("mean\nout-of-bag",".632plus","apparent","null model")) p <- ggplot(tmp, aes(grp,error)) pg <- p + geom_boxplot(outlier.colour = rgb(0,0,0,0), outlier.size=0) + geom_jitter(position=position_jitter(width=.1)) + theme_bw() + scale_y_continuous("AUC") + scale_x_discrete("") + geom_hline(aes(yintercept=error, colour=col), data=errs, show_guide=T) + scale_colour_identity("error type", guide = "legend", breaks=errs$col, labels=rownames(errs)) + ggtitle("AUC \n in bootstrap samples ") p2 <- ggplot(data.frame(complx=peperr_obj$sample.complexity), aes(x=complx)) pg2 <- p2 + geom_histogram(binwidth = 0.02, fill = "white", colour="black") + theme_bw()+ xlab(expression(lambda)) + ylab("frequency") + geom_vline(xintercept=peperr_obj$selected.complexity, colour="red") + ggtitle("Selected complexity \n in bootstrap samples") + ggplot2::annotate("text", x = 0.12, y = -0.5, label = "full data", colour="red", size=4) grid.arrange(pg2, pg, ncol=2) ## End(Not run)
## Not run: # binomial model - classification library(c060) library(gridExtra) library(ggplot2) set.seed(0815) x <- matrix(rnorm(100*20),100,20) y <- sample(0:1,100,replace=TRUE) peperr_obj <- peperr(response=y, x=x, fit.fun=fit.glmnet, args.fit=list(family="binomial"), complexity=complexity.glmnet, args.complexity=list(nfolds=10, family="binomial"), trace=F, RNG="fixed",seed=0815, # aggregation.fun=c060:::aggregation.misclass, # aggregation.fun=c060:::aggregation.brier, aggregation.fun=c060:::aggregation.auc, indices=resample.indices(n=nrow(x), sample.n = 100, method = "sub632")) tmp <- data.frame(grp="",error=unlist(peperr_obj$sample.error)) errs <- data.frame(error=c(perr(peperr_obj,"resample"), perr(peperr_obj,"632p"),perr(peperr_obj,"app"), perr(peperr_obj,"nullmodel")), col = c("red","blue","green","brown"), row.names=c("mean\nout-of-bag",".632plus","apparent","null model")) p <- ggplot(tmp, aes(grp,error)) pg <- p + geom_boxplot(outlier.colour = rgb(0,0,0,0), outlier.size=0) + geom_jitter(position=position_jitter(width=.1)) + theme_bw() + scale_y_continuous("AUC") + scale_x_discrete("") + geom_hline(aes(yintercept=error, colour=col), data=errs, show_guide=T) + scale_colour_identity("error type", guide = "legend", breaks=errs$col, labels=rownames(errs)) + ggtitle("AUC \n in bootstrap samples ") p2 <- ggplot(data.frame(complx=peperr_obj$sample.complexity), aes(x=complx)) pg2 <- p2 + geom_histogram(binwidth = 0.02, fill = "white", colour="black") + theme_bw()+ xlab(expression(lambda)) + ylab("frequency") + geom_vline(xintercept=peperr_obj$selected.complexity, colour="red") + ggtitle("Selected complexity \n in bootstrap samples") + ggplot2::annotate("text", x = 0.12, y = -0.5, label = "full data", colour="red", size=4) grid.arrange(pg2, pg, ncol=2) ## End(Not run)
Get balanced folds for cross validation, which are used for tuning penalization parameters
balancedFolds(class.column.factor, cross.outer)
balancedFolds(class.column.factor, cross.outer)
class.column.factor |
class labels of length n |
cross.outer |
number of folds |
permutated.cut |
vector of length n, indicating the fold belongs to |
model |
model list
|
Natalia Becker natalia.becker at dkfz.de
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
Get coefficients for a model after applying interval search for tuning parameters
## S3 method for class 'sum.intsearch' coef(object,...)
## S3 method for class 'sum.intsearch' coef(object,...)
object |
an object as returned by the function |
... |
additional argument(s) |
named vector of non-zero coeficients for the optimal lambda
Natalia Becker \ [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
EPSGO
, summary.intsearch
,plot.sum.intsearch
Determines the amount of shrinkage for a penalized regression model fitted by glmnet via cross-validation, conforming to the calling convention required by argument complexity
in peperr
call.
complexity.glmnet(response, x, full.data, ...)
complexity.glmnet(response, x, full.data, ...)
response |
a survival object (with |
x |
|
full.data |
data frame containing response and covariates of the full data set. |
... |
additional arguments passed to |
Function is basically a wrapper for cv.glmnet
of package glmnet
. A n-fold cross-validation (default n=10) is performed to determine the optimal penalty lambda.
For Cox PH regression models the deviance based on penalized partial log-likelihood is used as loss function. For binary endpoints other loss functions are available as well (see type.measure
). Deviance is default. Calling peperr
, the default arguments of cv.glmnet
can be changed by passing a named list containing these as argument args.complexity
.
Note that only penalized Cox PH (family="cox"
) and logistic regression models (family="binomial"
) are sensible for prediction error
evaluation with package peperr
.
Scalar value giving the optimal lambda.
Thomas Hielscher \ [email protected]
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf
Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010
https://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13
https://www.jstatsoft.org/v39/i05/
Porzelius, C., Binder, H., and Schumacher, M. (2009)
Parallelized prediction error estimation for evaluation of high-dimensional models,
Bioinformatics, Vol. 25(6), 827-829.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22.
doi:10.18637/jss.v062.i05
Finds an optimal solution for the Q.func function.
epsgo(Q.func, bounds, round.n=5, parms.coding="none", fminlower=0, flag.find.one.min =FALSE, show=c("none", "final", "all"), N= NULL, maxevals = 500, pdf.name=NULL, pdf.width=12, pdf.height=12, my.mfrow=c(1,1), verbose=TRUE, seed=123, ... )
epsgo(Q.func, bounds, round.n=5, parms.coding="none", fminlower=0, flag.find.one.min =FALSE, show=c("none", "final", "all"), N= NULL, maxevals = 500, pdf.name=NULL, pdf.width=12, pdf.height=12, my.mfrow=c(1,1), verbose=TRUE, seed=123, ... )
Q.func |
name of the function to be minimized. |
bounds |
bounds for parameters |
round.n |
number of digits after comma, default: 5 |
parms.coding |
parmeters coding: none or log2, default: none. |
fminlower |
minimal value for the function Q.func, default is 0. |
flag.find.one.min |
do you want to find one min value and stop? Default: FALSE |
show |
show plots of DIRECT algorithm: none, final iteration, all iterations. Default: none |
N |
define the number of start points, see details. |
maxevals |
the maximum number of DIRECT function evaluations, default: 500. |
pdf.name |
pdf name |
pdf.width |
default: 12 |
pdf.height |
default: 12 |
my.mfrow |
default: c(1,1) |
verbose |
verbose? default: TRUE. |
seed |
seed |
... |
additional argument(s) |
if the number of start points (N) is not defined by the user, it will be defined dependent on the dimensionality of the parameter space. N=10D+1, where D is the number of parameters, but for high dimensional parameter space with more than 6 dimensions, the initial set is restricted to 65. However for one-dimensional parameter space the N is set to 21 due to stability reasons.
The idea of EPSGO (Efficient Parameter Selection via Global Optimization): Beginning from an intial Latin hypercube sampling containing N starting points we train an Online GP, look for the point with the maximal expected improvement, sample there and update the Gaussian Process(GP). Thereby it is not so important that GP really correctly models the error surface of the SVM in parameter space, but that it can give a us information about potentially interesting points in parameter space where we should sample next. We continue with sampling points until some convergence criterion is met.
DIRECT is a sampling algorithm which requires no knowledge of the objective function gradient. Instead, the algorithm samples points in the domain, and uses the information it has obtained to decide where to search next. The DIRECT algorithm will globally converge to the maximal value of the objective function. The name DIRECT comes from the shortening of the phrase 'DIviding RECTangles', which describes the way the algorithm moves towards the optimum.
The code source was adopted from MATLAB originals, special thanks to Holger Froehlich.
fmin |
minimal value of Q.func on the interval defined by bounds. |
xmin |
corresponding parameters for the minimum |
iter |
number of iterations |
neval |
number of visited points |
maxevals |
the maximum number of DIRECT function evaluations |
seed |
seed |
bounds |
bounds for parameters |
Q.func |
name of the function to be minimized. |
points.fmin |
the set of points with the same fmin |
Xtrain |
visited points |
Ytrain |
the output of Q.func at visited points Xtrain |
gp.seed |
seed for Gaussian Process |
model.list |
detailed information of the search process |
Natalia Becker natalia.becker at dkfz.de
Froehlich, H. and Zell, A. (2005) "Effcient parameter selection for support vector
machines in classification and regression via model-based global optimization"
In Proc. Int. Joint Conf. Neural Networks, 1431-1438 .
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
## Not run: set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) nfolds = 10 set.seed(1234) foldid <- balancedFolds(class.column.factor=y.classes, cross.outer=nfolds) # y - binomial y.classes<-ifelse(y>= median(y),1, 0) bounds <- t(data.frame(alpha=c(0, 1))) colnames(bounds)<-c("lower","upper") fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "binomial", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) # y - multinomial: low - low 25%, middle - (25,75)-quantiles, high - larger 75%. y.classes<-ifelse(y <= quantile(y,0.25),1, ifelse(y >= quantile(y,0.75),3, 2)) bounds <- t(data.frame(alpha=c(0, 1))) colnames(bounds)<-c("lower","upper") fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "multinomial", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) ##poisson N=500; p=20 nzc=5 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) f = x[,seq(nzc)] mu=exp(f) y.classes=rpois(N,mu) nfolds = 10 set.seed(1234) foldid <- balancedFolds(class.column.factor=y.classes, cross.outer=nfolds) fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "poisson", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) foldid <- rep(1:10,each=10) fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y, family = "gaussian", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) # y - cox in vingette ## End(Not run)
## Not run: set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) nfolds = 10 set.seed(1234) foldid <- balancedFolds(class.column.factor=y.classes, cross.outer=nfolds) # y - binomial y.classes<-ifelse(y>= median(y),1, 0) bounds <- t(data.frame(alpha=c(0, 1))) colnames(bounds)<-c("lower","upper") fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "binomial", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) # y - multinomial: low - low 25%, middle - (25,75)-quantiles, high - larger 75%. y.classes<-ifelse(y <= quantile(y,0.25),1, ifelse(y >= quantile(y,0.75),3, 2)) bounds <- t(data.frame(alpha=c(0, 1))) colnames(bounds)<-c("lower","upper") fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "multinomial", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) ##poisson N=500; p=20 nzc=5 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) f = x[,seq(nzc)] mu=exp(f) y.classes=rpois(N,mu) nfolds = 10 set.seed(1234) foldid <- balancedFolds(class.column.factor=y.classes, cross.outer=nfolds) fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y.classes, family = "poisson", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) foldid <- rep(1:10,each=10) fit <- epsgo(Q.func="tune.glmnet.interval", bounds=bounds, parms.coding="none", seed = 1234, show="none", fminlower = -100, x = x, y = y, family = "gaussian", foldid = foldid, type.min = "lambda.1se", type.measure = "mse") summary(fit) # y - cox in vingette ## End(Not run)
glmnet
Interface for fitting penalized regression models for binary of survival endpoint using glmnet
, conforming to the requirements for argument fit.fun
in peperr
call.
fit.glmnet(response, x, cplx, ...)
fit.glmnet(response, x, cplx, ...)
response |
a survival object (with |
x |
|
cplx |
lambda penalty value. |
... |
additional arguments passed to |
Function is basically a wrapper for glmnet
of package glmnet.
Note that only penalized Cox PH (family="cox"
) and logistic regression models (family="binomial"
) are sensible for prediction error
evaluation with package peperr
.
glmnet object
Thomas Hielscher \ [email protected]
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf
Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010
https://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13
https://www.jstatsoft.org/v39/i05/
Porzelius, C., Binder, H., and Schumacher, M. (2009)
Parallelized prediction error estimation for evaluation of high-dimensional models,
Bioinformatics, Vol. 25(6), 827-829.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22.
doi:10.18637/jss.v062.i05
Extracts the predictive partial log-likelihood from a glmnet Cox PH model fit.
## S3 method for class 'coxnet' PLL(object, newdata, newtime, newstatus, complexity, ...)
## S3 method for class 'coxnet' PLL(object, newdata, newtime, newstatus, complexity, ...)
object |
fitted model of class |
newdata |
|
newtime |
|
newstatus |
|
complexity |
lambda penalty value. |
... |
additional arguments, not used. |
Used by function peperr
, if function fit.glmnet
and family="cox"
is used for model fit, which gives a class coxnet
object.
This is basically a wrapper based on the coxnet.deviance
function from package glmnet
.
Vector of length n_new
Thomas Hielscher \ [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
Creates several plots showing the coefficient path for the final model of a cv.glmnet fit and highlights the path of a pre-specified set of variables within the coefficient path.
Plot.coef.glmnet(cvfit, betas)
Plot.coef.glmnet(cvfit, betas)
cvfit |
an object of class "cv.glmnet" as returned by the function |
betas |
a vector of names of variables; must be a subset of rownames(coef(cvfit)). |
a list of four objects
stable |
a vector giving the positions of the estimated stable variables |
lambda |
the penalization parameter used for the stability selection |
lpos |
the position of the penalization parameter in the regularization path |
error |
the desired type I error level w.r.t. to the chosen type I error rate |
type |
the type I error rate |
Manuela Zucknick \ [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
## Not run: set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) cvob1=cv.glmnet(x,y) Plot.coef.glmnet(cvob1, c("V1","V100")) ## End(Not run)
## Not run: set.seed(1010) n=1000;p=100 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) set.seed(1011) cvob1=cv.glmnet(x,y) Plot.coef.glmnet(cvob1, c("V1","V100")) ## End(Not run)
Plots individual and aggregated prediction error estimates based on bootstrap samples.
Plot.peperr.curves(x, at.risk=TRUE, allErrors=FALSE, bootRuns=FALSE, bootQuants=TRUE, bootQuants.level=0.95, leg.cex=0.7,...)
Plot.peperr.curves(x, at.risk=TRUE, allErrors=FALSE, bootRuns=FALSE, bootQuants=TRUE, bootQuants.level=0.95, leg.cex=0.7,...)
x |
|
at.risk |
number at risk to be display. default is TRUE. |
allErrors |
Display .632, no information and average out-of-bag error in addition. default is FALSE. |
bootRuns |
Display individual out-of-bag bootstrap samples. default is FALSE. |
bootQuants |
Display pointwise out-of-bag bootstrap quantiles as shaded area. default is TRUE. |
bootQuants.level |
Quantile probabilities for pointwise out-of-bag bootstrap quantiles. default is 0.95, i.e. 2.5% and 97.5% quantiles. |
leg.cex |
size of legend text |
... |
additional arguments, not used. |
This function is literally taken from plot.peperr
in the peperr
package.
The display of prediction error curves is adapted to allow for numbers at risk and pointwise bootstrap quantiles.
Thomas Hielscher [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
## Not run: # example from glmnet package set.seed(10101) library(glmnet) library(survival) library(peperr) N=1000;p=30 nzc=p/3 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) fx=x[,seq(nzc)] hx=exp(fx) ty=rexp(N,hx) tcens=rbinom(n=N,prob=.3,size=1)# censoring indicator y=Surv(ty,1-tcens) peperr.object <- peperr(response=y, x=x, fit.fun=fit.glmnet, args.fit=list(family="cox"), complexity=complexity.glmnet, args.complexity=list(family="cox",nfolds=10), indices=resample.indices(n=N, method="sub632", sample.n=10)) # pointwise bootstrap quantiles and all error types Plot.peperr.curves(peperr.object, allErrors=TRUE) # individual bootstrap runs and selected error types Plot.peperr.curves(peperr.object, allErrors=FALSE, bootRuns=TRUE) ## End(Not run)
## Not run: # example from glmnet package set.seed(10101) library(glmnet) library(survival) library(peperr) N=1000;p=30 nzc=p/3 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) fx=x[,seq(nzc)] hx=exp(fx) ty=rexp(N,hx) tcens=rbinom(n=N,prob=.3,size=1)# censoring indicator y=Surv(ty,1-tcens) peperr.object <- peperr(response=y, x=x, fit.fun=fit.glmnet, args.fit=list(family="cox"), complexity=complexity.glmnet, args.complexity=list(family="cox",nfolds=10), indices=resample.indices(n=N, method="sub632", sample.n=10)) # pointwise bootstrap quantiles and all error types Plot.peperr.curves(peperr.object, allErrors=TRUE) # individual bootstrap runs and selected error types Plot.peperr.curves(peperr.object, allErrors=FALSE, bootRuns=TRUE) ## End(Not run)
Given a desired family-wise error rate (FWER) and a stability path calculated with stability.path
the function selects an stable set of features and plots the stability path and the corresponding regularization path.
## S3 method for class 'stabpath' plot(x, error=0.05, type=c("pfer","pcer"), pi_thr=0.6, xvar=c("lambda", "norm", "dev"), col.all="black", col.sel="red", ...)
## S3 method for class 'stabpath' plot(x, error=0.05, type=c("pfer","pcer"), pi_thr=0.6, xvar=c("lambda", "norm", "dev"), col.all="black", col.sel="red", ...)
x |
an object of class "stabpath" as returned by the function |
error |
the desired type I error level w.r.t. to the chosen type I error rate. |
type |
The type I error rate used for controlling the number falsely selected variables. If |
pi_thr |
the threshold used for the stability selection, should be in the range of 0.5 > pi_thr < 1. |
xvar |
the variable used for the xaxis, e.g. for "lambda" the selection probabilities are plotted along the log of the penalization parameters, for "norm" along the L1-norm and for "dev" along the fraction of explained deviance. |
col.all |
the color used for the variables that are not in the estimated stable set |
col.sel |
the color used for the variables in the estimated stable set |
... |
further arguments that are passed to matplot |
a list of four objects
stable |
a vector giving the positions of the estimated stable variables |
lambda |
the penalization parameter used for the stability selection |
lpos |
the position of the penalization parameter in the regularization path |
error |
the desired type I error level w.r.t. to the chosen type I error rate |
type |
the type I error rate |
Martin Sill \ [email protected]
Meinshausen N. and Buehlmann P. (2010), Stability Selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology) Volume 72, Issue 4, pages 417-473.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22.
doi:10.18637/jss.v062.i05
## Not run: #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) plot(res,error=.5,type='pfer') ## End(Not run)
## Not run: #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) plot(res,error=.5,type='pfer') ## End(Not run)
Produces a plot for summary object of a fitted interval search model. Plot 'visited' points against iteration steps. start.N points are initial points selected before interval search starts.
## S3 method for class 'sum.intsearch' plot(x,type="summary",startN=21,... )
## S3 method for class 'sum.intsearch' plot(x,type="summary",startN=21,... )
x |
an object of class |
type |
type of plot to be drawn, |
startN |
number of initial points. Needed if |
... |
additional argument(s) |
Natalia Becker \ [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
Extracts predicted survival probabilities from survival model fitted by glmnet, providing an interface as required by pmpec
.
## S3 method for class 'coxnet' predictProb(object, response, x, times, complexity, ...)
## S3 method for class 'coxnet' predictProb(object, response, x, times, complexity, ...)
object |
a fitted model of class |
response |
a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function |
x |
|
times |
vector of evaluation time points. |
complexity |
lambda penalty value. |
... |
additional arguments, currently not used. |
Matrix with probabilities for each evaluation time point in times
(columns) and each new observation (rows).
Thomas Hielscher \ [email protected]
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf
Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010
https://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13
https://www.jstatsoft.org/v39/i05/
Porzelius, C., Binder, H., and Schumacher, M. (2009)
Parallelized prediction error estimation for evaluation of high-dimensional models,
Bioinformatics, Vol. 25(6), 827-829.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22.
doi:10.18637/jss.v062.i05
predictProb.glmnet
,peperr
, glmnet
Extracts predicted survival probabilities from survival model fitted by glmnet, providing an interface as required by pmpec
.
## S3 method for class 'glmnet' predictProb(object, response, x, times, complexity, ...)
## S3 method for class 'glmnet' predictProb(object, response, x, times, complexity, ...)
object |
a fitted model of class |
response |
response variable. Quantitative for |
x |
|
times |
vector of evaluation time points. |
complexity |
lambda penalty value. |
... |
additional arguments, currently not used. |
Matrix with probabilities for each evaluation time point in times
(columns) and each new observation (rows).
Thomas Hielscher \ [email protected]
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf
Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010
https://www.jstatsoft.org/v33/i01/
Simon, N., Friedman, J., Hastie, T., Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional Hazards Model via
Coordinate Descent, Journal of Statistical Software, Vol. 39(5)
1-13
https://www.jstatsoft.org/v39/i05/
Porzelius, C., Binder, H., and Schumacher, M. (2009)
Parallelized prediction error estimation for evaluation of high-dimensional models,
Bioinformatics, Vol. 25(6), 827-829.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22.
doi:10.18637/jss.v062.i05
predictProb.coxnet
, peperr
, glmnet
The function calculates the stability path for glmnet models, e.g. the selection probabilities of the features along the range of regularization parameters.
stabpath(y,x,size=0.632,steps=100,weakness=1,mc.cores=getOption("mc.cores", 2L),...)
stabpath(y,x,size=0.632,steps=100,weakness=1,mc.cores=getOption("mc.cores", 2L),...)
y |
response variable. Like for the glment function: Quantitative for |
x |
input matrix. Like for the glmnet function:
of dimension nobs x nvars; each row is an
observation vector. Can be in sparse matrix format (inherit
from class |
size |
proportion of samples drawn in every subsample used for the stability selection. |
steps |
number of subsamples used for the stability selection. |
weakness |
weakness parameter used for the randomised lasso as described in Meinshausen and B\"uhlmann (2010). For each subsample the features are reweighted by a random weight uniformly sampled in [weakness,1]. This additional randomisation leads to a more consistent estimation of the stable set of features. |
mc.cores |
number of cores used for the parallelization. For unix like system the parallelization is done by forking using the function |
... |
further arguments that are passed to the |
an object of class "stabpath", which is a list of three objects
fit |
the fit object of class "glmnet" as returned from the glmnet function when applied to the complete data set. |
stabpath |
a matrix which represents the stability path. |
qs |
a vector holding the values of the average number of non-zero coefficients w.r.t to the lambdas in the regularization path. |
Martin Sill [email protected]
Meinshausen N. and B\"uhlmann P. (2010), Stability Selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology) Volume 72, Issue 4, pages 417–473.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
## Not run: #gaussian set.seed(1234) x <- matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*% c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) plot(res) #binomial y=sample(1:2,100,replace=TRUE) res <- stabpath(y,x,weakness=1,mc.cores=2,family="binomial") plot(res) #multinomial y=sample(1:4,100,replace=TRUE) res <- stabpath(y,x,weakness=1,mc.cores=2,family="multinomial") plot(res) #poisson N=100; p=1000 nzc=5 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) f = x[,seq(nzc)]%*%beta mu=exp(f) y=rpois(N,mu) res <- stabpath(y,x,weakness=1,mc.cores=2,family="poisson") plot(res) #Cox library(survival) set.seed(10101) N=100;p=1000 nzc=p/3 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) fx=x[,seq(nzc)]%*%beta/3 hx=exp(fx) ty=rexp(N,hx) tcens=rbinom(n=N,prob=.3,size=1) y=cbind(time=ty,status=1-tcens) res <- stabpath(y,x,weakness=1,mc.cores=2,family="cox") plot(res) ## End(Not run)
## Not run: #gaussian set.seed(1234) x <- matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*% c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) plot(res) #binomial y=sample(1:2,100,replace=TRUE) res <- stabpath(y,x,weakness=1,mc.cores=2,family="binomial") plot(res) #multinomial y=sample(1:4,100,replace=TRUE) res <- stabpath(y,x,weakness=1,mc.cores=2,family="multinomial") plot(res) #poisson N=100; p=1000 nzc=5 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) f = x[,seq(nzc)]%*%beta mu=exp(f) y=rpois(N,mu) res <- stabpath(y,x,weakness=1,mc.cores=2,family="poisson") plot(res) #Cox library(survival) set.seed(10101) N=100;p=1000 nzc=p/3 x=matrix(rnorm(N*p),N,p) beta=rnorm(nzc) fx=x[,seq(nzc)]%*%beta/3 hx=exp(fx) ty=rexp(N,hx) tcens=rbinom(n=N,prob=.3,size=1) y=cbind(time=ty,status=1-tcens) res <- stabpath(y,x,weakness=1,mc.cores=2,family="cox") plot(res) ## End(Not run)
Given a desired type I error rate and a stability path calculated with stability.path
the function selects a stable set of variables.
stabsel(x,error=0.05,type=c("pfer","pcer"),pi_thr=0.6)
stabsel(x,error=0.05,type=c("pfer","pcer"),pi_thr=0.6)
x |
an object of class "stabpath" as returned by the function |
error |
the desired type I error level w.r.t. to the chosen type I error rate. |
type |
The type I error rate used for controlling the number falsely selected variables. If |
pi_thr |
the threshold used for the stability selection, should be in the range of $0.5 > pi_thr < 1$. |
a list of four objects
stable |
a vector giving the positions of the estimated stable variables |
lambda |
the penalization parameter used for the stability selection |
lpos |
the position of the penalization parameter in the regularization path |
error |
the desired type I error level w.r.t. to the chosen type I error rate |
type |
the type I error rate |
Martin Sill \ [email protected]
Meinshausen N. and B\"uhlmann P. (2010), Stability Selection, Journal of the Royal Statistical Society: Series B (Statistical Methodology) Volume 72, Issue 4, pages 417–473.
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
## Not run: #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) stabsel(res,error=0.05,type="pfer") ## End(Not run)
## Not run: #gaussian set.seed(1234) x=matrix(rnorm(100*1000,0,1),100,1000) y <- x[1:100,1:1000]%*%c(rep(2,5),rep(-2,5),rep(.1,990)) res <- stabpath(y,x,weakness=1,mc.cores=2) stabsel(res,error=0.05,type="pfer") ## End(Not run)
Produces a summary of a fitted interval search model
## S3 method for class 'intsearch' summary(object,digits = max(3, getOption("digits") - 3), verbose=TRUE,first.n=5,...)
## S3 method for class 'intsearch' summary(object,digits = max(3, getOption("digits") - 3), verbose=TRUE,first.n=5,...)
object |
an object of class |
digits |
digits after the comma |
verbose |
default set to TRUE. |
first.n |
show first.n entries , default 5. |
... |
additional argument(s) |
A list of following elements
info |
a data frame of four objects for optimal models
|
opt.alpha |
an optimal value for alpha |
opt.lambda |
an optimal value for lambda |
opt.error |
an optimal value for error, hier minimal diviance |
opt.models |
a list of optimal models with the same optimal error |
Natalia Becker \ [email protected]
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05
glmnet
objects. Wrapper function for glmnet
objects used by epsgo
function.
This function is mainly used within the function epsgo
tune.glmnet.interval(parms, x, y, weights, offset = NULL, lambda = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), seed=12345, nfolds = 10, foldid=NULL, grouped = TRUE, type.min=c("lambda.min", "lambda.1se"), family, verbose=FALSE, ...)
tune.glmnet.interval(parms, x, y, weights, offset = NULL, lambda = NULL, type.measure = c("mse", "deviance", "class", "auc", "mae"), seed=12345, nfolds = 10, foldid=NULL, grouped = TRUE, type.min=c("lambda.min", "lambda.1se"), family, verbose=FALSE, ...)
parms |
tuning parameter alpha for |
x , y
|
x is a matrix where each row refers to a sample a each column refers to a gene; y is a factor which includes the class for each sample |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
offset |
A vector of length nobs that is included in the linear predictor (a nobs x nc matrix for the "multinomial" family). Useful for the "poisson" family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL. If supplied, then values must also be supplied to the predict function. |
lambda |
A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Do not supply a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit. |
type.measure |
loss to use for cross-validation. Currently five options, not all available for all models. The default is type.measure="deviance", which uses squared-error for gaussian models (a.k.a type.measure="mse" there), deviance for logistic and poisson regression, and partial-likelihood for the Cox model. type.measure="class" applies to binomial and multinomial logistic regression only, and gives misclassification error. type.measure="auc" is for two-class logistic regression only, and gives area under the ROC curve. type.measure="mse" or type.measure="mae" (mean absolute error) can be used by all models except the "cox"; they measure the deviation from the fitted mean to the response. |
seed |
seed |
nfolds |
number of cross-validation's folds, default 10. |
foldid |
an optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing. |
grouped |
This is an experimental argument, with default TRUE, and can be ignored by most users. For all models except the "cox", this refers to computing nfolds separate statistics, and then using their mean and estimated standard error to describe the CV curve. If grouped=FALSE, an error matrix is built up at the observation level from the predictions from the nfold fits, and then summarized (does not apply to type.measure="auc"). For the "cox" family, grouped=TRUE obtains the CV partial likelihood for the Kth fold by subtraction; by subtracting the log partial likelihood evaluated on the full dataset from that evaluated on the on the (K-1)/K dataset. This makes more efficient use of risk sets. With grouped=FALSE the log partial likelihood is computed only on the Kth fold |
type.min |
parameter for chosing optimal model: 'lambda.min'- value of lambda that gives minimum mean cross-validated error (cvm). 'lambda.1se' - largest value of lambda such that error is within one standard error of the minimum. |
family |
family of the model, i.e. cox, glm,... |
verbose |
verbose |
... |
Further parameters |
q.val |
minimal value of Q.func on the interval defined by bounds. Here, q.val is minimum mean cross-validate d error (cvm) |
model |
model list
|
Natalia Becker natalia.becker at dkfz.de
Sill M., Hielscher T., Becker N. and Zucknick M. (2014), c060: Extended Inference with Lasso and Elastic-Net Regularized Cox and Generalized Linear Models, Journal of Statistical Software, Volume 62(5), pages 1–22. doi:10.18637/jss.v062.i05