| Title: | Partial Least Squares Regression for Generalized Linear Models |
|---|---|
| Description: | Provides (weighted) Partial least squares Regression for generalized linear models and repeated k-fold cross-validation of such models using various criteria <doi:10.48550/arXiv.1810.01005>. It allows for missing data in the explanatory variables. Bootstrap confidence intervals constructions are also available. |
| Authors: | Frederic Bertrand [cre, aut]
|
| Maintainer: | Frederic Bertrand <[email protected]> |
| License: | GPL-3 |
| Version: | 1.7.0 |
| Built: | 2026-05-29 19:32:35 UTC |
| Source: | https://github.com/fbertran/plsrglm |
This function computes the Akaike and Bayesian Information Criteria and the Generalized minimum description length.
aic.dof(RSS, n, DoF, sigmahat) bic.dof(RSS, n, DoF, sigmahat) gmdl.dof(sigmahat, n, DoF, yhat)aic.dof(RSS, n, DoF, sigmahat) bic.dof(RSS, n, DoF, sigmahat) gmdl.dof(sigmahat, n, DoF, yhat)
RSS |
vector of residual sum of squares. |
n |
number of observations. |
DoF |
vector of Degrees of Freedom. The length of |
sigmahat |
Estimated model error. The length of |
yhat |
vector of squared norm of Yhat. The length of |
The gmdl criterion is defined as
with
vector |
numerical values of the requested AIC, BIC or GMDL. |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
M. Hansen, B. Yu. (2001). Model Selection and Minimum Descripion
Length Principle, Journal of the American Statistical Association,
96, 746-774.
N. Kraemer, M. Sugiyama. (2011). The Degrees of Freedom of
Partial Least Squares Regression. Journal of the American Statistical
Association, 106(494), 697-705.
N. Kraemer, M.L. Braun, Kernelizing PLS,
Degrees of Freedom, and Efficient Model Selection, Proceedings of the
24th International Conference on Machine Learning, Omni Press, (2007)
441-448.
plsR.dof for degrees of freedom computation and
infcrit.dof for computing information criteria directly from a
previously fitted plsR model.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) dof.object <- plsR.dof(modpls) aic.dof(modpls$RSS,modpls$nr,dof.object$DoF,dof.object$sigmahat) bic.dof(modpls$RSS,modpls$nr,dof.object$DoF,dof.object$sigmahat) gmdl.dof(dof.object$sigmahat,modpls$nr,dof.object$DoF,dof.object$yhat) naive.object <- plsR.dof(modpls,naive=TRUE) aic.dof(modpls$RSS,modpls$nr,naive.object$DoF,naive.object$sigmahat) bic.dof(modpls$RSS,modpls$nr,naive.object$DoF,naive.object$sigmahat) gmdl.dof(naive.object$sigmahat,modpls$nr,naive.object$DoF,naive.object$yhat)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) dof.object <- plsR.dof(modpls) aic.dof(modpls$RSS,modpls$nr,dof.object$DoF,dof.object$sigmahat) bic.dof(modpls$RSS,modpls$nr,dof.object$DoF,dof.object$sigmahat) gmdl.dof(dof.object$sigmahat,modpls$nr,dof.object$DoF,dof.object$yhat) naive.object <- plsR.dof(modpls,naive=TRUE) aic.dof(modpls$RSS,modpls$nr,naive.object$DoF,naive.object$sigmahat) bic.dof(modpls$RSS,modpls$nr,naive.object$DoF,naive.object$sigmahat) gmdl.dof(naive.object$sigmahat,modpls$nr,naive.object$DoF,naive.object$yhat)
This function provides AIC computation for an univariate plsR model.
AICpls(ncomp, residpls, weights = rep.int(1, length(residpls)))AICpls(ncomp, residpls, weights = rep.int(1, length(residpls)))
ncomp |
Number of components |
residpls |
Residuals of a fitted univariate plsR model |
weights |
Weights of observations |
AIC function for plsR models with univariate response.
real |
AIC value |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Baibing Li, Julian Morris, Elaine B. Martin, Model selection for partial least squares regression, Chemometrics and Intelligent Laboratory Systems 64 (2002) 79-89, doi:10.1016/S0169-7439(02)00051-5.
loglikpls for loglikelihood computations for plsR
models and AIC for AIC computation for a linear models
data(pine) ypine <- pine[,11] Xpine <- pine[,1:10] (Pinscaled <- as.data.frame(cbind(scale(ypine),scale(as.matrix(Xpine))))) colnames(Pinscaled)[1] <- "yy" lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) modpls <- plsR(ypine,Xpine,10) modpls$Std.Coeffs lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) AIC(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)) print(logLik(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) sum(dnorm(modpls$RepY, modpls$Std.ValsPredictY, sqrt(mean(modpls$residY^2)), log=TRUE)) sum(dnorm(Pinscaled$yy,fitted(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)), sqrt(mean(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))^2)), log=TRUE)) loglikpls(modpls$residY) loglikpls(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,modpls$residY)data(pine) ypine <- pine[,11] Xpine <- pine[,1:10] (Pinscaled <- as.data.frame(cbind(scale(ypine),scale(as.matrix(Xpine))))) colnames(Pinscaled)[1] <- "yy" lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) modpls <- plsR(ypine,Xpine,10) modpls$Std.Coeffs lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) AIC(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)) print(logLik(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) sum(dnorm(modpls$RepY, modpls$Std.ValsPredictY, sqrt(mean(modpls$residY^2)), log=TRUE)) sum(dnorm(Pinscaled$yy,fitted(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)), sqrt(mean(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))^2)), log=TRUE)) loglikpls(modpls$residY) loglikpls(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,modpls$residY)
This database was collected on patients carrying a colon adenocarcinoma. It
has 104 observations on 33 binary qualitative explanatory variables and one
response variable y representing the cancer stage according to the to
Astler-Coller classification (Astler and Coller, 1954). This dataset has
some missing data due to technical limits. A microsattelite is a non-coding
DNA sequence.
A data frame with 104 observations on the following 34 variables.
the response: a binary vector (Astler-Coller score).
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
Weber et al. (2007). Allelotyping analyzes of synchronous primary and metastasis CIN colon cancers identified different subtypes. Int J Cancer, 120(3), pages 524-32.
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Société Française de Statistique, 151(2), pages 1-18.
data(aze) str(aze)data(aze) str(aze)
This is a single imputation of the aze dataset which was
collected on patients carrying a colon adenocarcinoma. It has 104
observations on 33 binary qualitative explanatory variables and one response
variable y representing the cancer stage according to the to
Astler-Coller classification (Astler and Coller, 1954). A microsattelite is
a non-coding DNA sequence.
A data frame with 104 observations on the following 34 variables.
the response: a binary vector (Astler-Coller score).
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
a binary vector that indicates whether this microsatellite is altered or not.
Weber et al. (2007). Allelotyping analyzes of synchronous primary and metastasis CIN colon cancers identified different subtypes. Int J Cancer, 120(3), pages 524-32.
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Société Française de Statistique, 151(2), pages 1-18.
data(aze_compl) str(aze_compl)data(aze_compl) str(aze_compl)
Provides a wrapper for the bootstrap function boot from the
boot R package.
Implements non-parametric bootstraps for PLS
Regression models by either (Y,X) or (Y,T) resampling.
bootpls( object, typeboot = "plsmodel", R = 250, statistic = NULL, sim = "ordinary", stype = "i", stabvalue = 1e+06, verbose = TRUE, ... )bootpls( object, typeboot = "plsmodel", R = 250, statistic = NULL, sim = "ordinary", stype = "i", stabvalue = 1e+06, verbose = TRUE, ... )
object |
An object of class |
typeboot |
The type of bootstrap. Either (Y,X) boostrap
( |
R |
The number of bootstrap replicates. Usually this will be a single
positive integer. For importance resampling, some resamples may use one set
of weights and others use a different set of weights. In this case |
statistic |
A function which when applied to data returns a vector
containing the statistic(s) of interest. |
sim |
A character string indicating the type of simulation required.
Possible values are |
stype |
A character string indicating what the second argument of
|
stabvalue |
A value to hard threshold bootstrap estimates computed from atypical resamplings. Especially useful for Generalized Linear Models. |
verbose |
should info messages be displayed ? |
... |
Other named arguments for |
More details on bootstrap techniques are available in the help of the
boot function.
An object of class "boot". See the Value part of the help of
the function boot.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
A. Lazraq, R. Cleroux, and J.-P. Gauchi. (2003). Selecting both
latent and explanatory variables in the PLS1 regression model.
Chemometrics and Intelligent Laboratory Systems, 66(2):117-126.
P.
Bastien, V. Esposito-Vinzi, and M. Tenenhaus. (2005). PLS generalised linear
regression. Computational Statistics & Data Analysis, 48(1):17-46.
A. C. Davison and D. V. Hinkley. (1997). Bootstrap Methods and Their
Applications. Cambridge University Press, Cambridge.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS ordinary bootstrap set.seed(250) modpls <- plsR(yCornell,XCornell,3) #(Y,X) resampling Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) #(Y,T) resampling Cornell.bootYT <- bootpls(modpls, typeboot="fmodel_np", R=250, verbose=FALSE) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) # Confidence intervals plotting confints.bootpls(Cornell.bootYX,indices=2:8) plots.confints.bootpls(confints.bootpls(Cornell.bootYX,indices=2:8)) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(250,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) library(boot) boot.ci(Cornell.bootYX, conf = c(0.90,0.95), type = c("norm","basic","perc","bca"), index=2) plot(Cornell.bootYX,index=2) jack.after.boot(Cornell.bootYX, index=2, useJ=TRUE, nt=3) plot(Cornell.bootYX,index=2,jack=TRUE) car::dataEllipse(Cornell.bootYX$t[,2], Cornell.bootYX$t[,3], cex=.3, levels=c(.5, .95, .99), robust=TRUE) rm(Cornell.bootYX) # PLS balanced bootstrap set.seed(225) Cornell.bootYX <- bootpls(modpls, sim="balanced", R=250, verbose=FALSE) boot.array(Cornell.bootYX, indices=TRUE) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) # Confidence intervals plotting confints.bootpls(Cornell.bootYX,indices=2:8) plots.confints.bootpls(confints.bootpls(Cornell.bootYX,indices=2:8)) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(250,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) library(boot) boot.ci(Cornell.bootYX, conf = c(0.90,0.95), type = c("norm","basic","perc","bca"), index=2, verbose=FALSE) plot(Cornell.bootYX,index=2) jack.after.boot(Cornell.bootYX, index=2, useJ=TRUE, nt=3) plot(Cornell.bootYX,index=2,jack=TRUE) rm(Cornell.bootYX) # PLS permutation bootstrap set.seed(500) Cornell.bootYX <- bootpls(modpls, sim="permutation", R=1000, verbose=FALSE) boot.array(Cornell.bootYX, indices=TRUE) # Graph of bootstrap distributions boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(1000,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) library(boot) plot(Cornell.bootYX,index=2) qqnorm(Cornell.bootYX$t[,2],ylim=c(-1,1)) abline(h=Cornell.bootYX$t0[2],lty=2) (sum(abs(Cornell.bootYX$t[,2])>=abs(Cornell.bootYX$t0[2]))+1)/(length(Cornell.bootYX$t[,2])+1) rm(Cornell.bootYX)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS ordinary bootstrap set.seed(250) modpls <- plsR(yCornell,XCornell,3) #(Y,X) resampling Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) #(Y,T) resampling Cornell.bootYT <- bootpls(modpls, typeboot="fmodel_np", R=250, verbose=FALSE) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) # Confidence intervals plotting confints.bootpls(Cornell.bootYX,indices=2:8) plots.confints.bootpls(confints.bootpls(Cornell.bootYX,indices=2:8)) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(250,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) library(boot) boot.ci(Cornell.bootYX, conf = c(0.90,0.95), type = c("norm","basic","perc","bca"), index=2) plot(Cornell.bootYX,index=2) jack.after.boot(Cornell.bootYX, index=2, useJ=TRUE, nt=3) plot(Cornell.bootYX,index=2,jack=TRUE) car::dataEllipse(Cornell.bootYX$t[,2], Cornell.bootYX$t[,3], cex=.3, levels=c(.5, .95, .99), robust=TRUE) rm(Cornell.bootYX) # PLS balanced bootstrap set.seed(225) Cornell.bootYX <- bootpls(modpls, sim="balanced", R=250, verbose=FALSE) boot.array(Cornell.bootYX, indices=TRUE) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) # Confidence intervals plotting confints.bootpls(Cornell.bootYX,indices=2:8) plots.confints.bootpls(confints.bootpls(Cornell.bootYX,indices=2:8)) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(250,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) library(boot) boot.ci(Cornell.bootYX, conf = c(0.90,0.95), type = c("norm","basic","perc","bca"), index=2, verbose=FALSE) plot(Cornell.bootYX,index=2) jack.after.boot(Cornell.bootYX, index=2, useJ=TRUE, nt=3) plot(Cornell.bootYX,index=2,jack=TRUE) rm(Cornell.bootYX) # PLS permutation bootstrap set.seed(500) Cornell.bootYX <- bootpls(modpls, sim="permutation", R=1000, verbose=FALSE) boot.array(Cornell.bootYX, indices=TRUE) # Graph of bootstrap distributions boxplot(as.vector(Cornell.bootYX$t[,-1])~factor(rep(1:7,rep(1000,7))), main="Bootstrap distributions of standardised bj (j = 1, ..., 7).") points(c(1:7),Cornell.bootYX$t0[-1],col="red",pch=19) # Using the boxplots.bootpls function boxplots.bootpls(Cornell.bootYX,indices=2:8) library(boot) plot(Cornell.bootYX,index=2) qqnorm(Cornell.bootYX$t[,2],ylim=c(-1,1)) abline(h=Cornell.bootYX$t0[2],lty=2) (sum(abs(Cornell.bootYX$t[,2])>=abs(Cornell.bootYX$t0[2]))+1)/(length(Cornell.bootYX$t[,2])+1) rm(Cornell.bootYX)
Provides a wrapper for the bootstrap function boot from the
boot R package.
Implements non-parametric bootstraps for PLS
Generalized Linear Regression models by either (Y,X) or (Y,T) resampling.
bootplsglm( object, typeboot = "fmodel_np", R = 250, statistic = NULL, sim = "ordinary", stype = "i", stabvalue = 1e+06, verbose = TRUE, ... )bootplsglm( object, typeboot = "fmodel_np", R = 250, statistic = NULL, sim = "ordinary", stype = "i", stabvalue = 1e+06, verbose = TRUE, ... )
object |
An object of class |
typeboot |
The type of bootstrap. Either (Y,X) boostrap
( |
R |
The number of bootstrap replicates. Usually this will be a single
positive integer. For importance resampling, some resamples may use one set
of weights and others use a different set of weights. In this case |
statistic |
A function which when applied to data returns a vector
containing the statistic(s) of interest. |
sim |
A character string indicating the type of simulation required.
Possible values are |
stype |
A character string indicating what the second argument of
|
stabvalue |
A value to hard threshold bootstrap estimates computed from atypical resamplings. Especially useful for Generalized Linear Models. |
verbose |
should info messages be displayed ? |
... |
Other named arguments for |
More details on bootstrap techniques are available in the help of the
boot function.
An object of class "boot". See the Value part of the help of
the function boot.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
A. Lazraq, R. Cleroux, and J.-P. Gauchi. (2003). Selecting both
latent and explanatory variables in the PLS1 regression model.
Chemometrics and Intelligent Laboratory Systems, 66(2):117-126.
P.
Bastien, V. Esposito-Vinzi, and M. Tenenhaus. (2005). PLS generalised linear
regression. Computational Statistics & Data Analysis, 48(1):17-46.
A. C. Davison and D. V. Hinkley. (1997). Bootstrap Methods and Their
Applications. Cambridge University Press, Cambridge.
#Imputed aze dataset data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y dataset <- cbind(y=yaze_compl,Xaze_compl) modplsglm <- plsRglm(y~.,data=dataset,3,modele="pls-glm-logistic") library(boot) # Bastien (Y,T) PLS bootstrap aze_compl.bootYT <- bootplsglm(modplsglm, R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) confints.bootpls(aze_compl.bootYT) plots.confints.bootpls(confints.bootpls(aze_compl.bootYT)) # (Y,X) PLS bootstrap aze_compl.bootYX <- bootplsglm(modplsglm, R=250, verbose=FALSE, typeboot = "plsmodel") boxplots.bootpls(aze_compl.bootYX) confints.bootpls(aze_compl.bootYX) plots.confints.bootpls(confints.bootpls(aze_compl.bootYX)) # (Y,X) PLS bootstrap raw coefficients aze_compl.bootYX.raw <- bootplsglm(modplsglm, R=250, verbose=FALSE, typeboot = "plsmodel", statistic=coefs.plsRglm.raw) boxplots.bootpls(aze_compl.bootYX.raw) confints.bootpls(aze_compl.bootYX.raw) plots.confints.bootpls(confints.bootpls(aze_compl.bootYX.raw)) plot(aze_compl.bootYT,index=2) jack.after.boot(aze_compl.bootYT, index=2, useJ=TRUE, nt=3) plot(aze_compl.bootYT, index=2,jack=TRUE) aze_compl.tilt.boot <- tilt.bootplsglm(modplsglm, statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) # PLS bootstrap balanced aze_compl.bootYT <- bootplsglm(modplsglm, sim="balanced", R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) confints.bootpls(aze_compl.bootYT) plots.confints.bootpls(confints.bootpls(aze_compl.bootYT)) plot(aze_compl.bootYT) jack.after.boot(aze_compl.bootYT, index=1, useJ=TRUE, nt=3) plot(aze_compl.bootYT,jack=TRUE) aze_compl.tilt.boot <- tilt.bootplsglm(modplsglm, statistic=coefs.plsR, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="balanced", stype="i", index=1) # PLS permutation bootstrap aze_compl.bootYT <- bootplsglm(modplsglm, sim="permutation", R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) plot(aze_compl.bootYT) #Original aze dataset with missing values data(aze) Xaze<-aze[,2:34] yaze<-aze$y library(boot) modplsglm2 <- plsRglm(yaze,Xaze,3,modele="pls-glm-logistic") aze.bootYT <- bootplsglm(modplsglm2, R=250, verbose=FALSE) boxplots.bootpls(aze.bootYT) confints.bootpls(aze.bootYT) plots.confints.bootpls(confints.bootpls(aze.bootYT)) #Ordinal logistic regression data(bordeaux) Xbordeaux<-bordeaux[,1:4] ybordeaux<-factor(bordeaux$Quality,ordered=TRUE) dataset <- cbind(y=ybordeaux,Xbordeaux) options(contrasts = c("contr.treatment", "contr.poly")) modplsglm3 <- plsRglm(ybordeaux,Xbordeaux,1,modele="pls-glm-polr") bordeaux.bootYT<- bootplsglm(modplsglm3, sim="permutation", R=250, verbose=FALSE) boxplots.bootpls(bordeaux.bootYT) boxplots.bootpls(bordeaux.bootYT,ranget0=TRUE) bordeaux.bootYT2<- bootplsglm(modplsglm3, sim="permutation", R=250, strata=unclass(ybordeaux), verbose=FALSE) boxplots.bootpls(bordeaux.bootYT2,ranget0=TRUE) if(require(chemometrics)){ data(hyptis) hyptis yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) dataset <- cbind(y=yhyptis,Xhyptis) options(contrasts = c("contr.treatment", "contr.poly")) modplsglm4 <- plsRglm(yhyptis,Xhyptis,3,modele="pls-glm-polr") hyptis.bootYT3<- bootplsglm(modplsglm4, sim="permutation", R=250, verbose=FALSE) rownames(hyptis.bootYT3$t0)<-c("Sabi\nnene","Pin\nene", "Cine\nole","Terpi\nnene","Fenc\nhone","Terpi\nnolene") boxplots.bootpls(hyptis.bootYT3) boxplots.bootpls(hyptis.bootYT3,xaxisticks=FALSE) boxplots.bootpls(hyptis.bootYT3,ranget0=TRUE) boxplots.bootpls(hyptis.bootYT3,ranget0=TRUE,xaxisticks=FALSE) }#Imputed aze dataset data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y dataset <- cbind(y=yaze_compl,Xaze_compl) modplsglm <- plsRglm(y~.,data=dataset,3,modele="pls-glm-logistic") library(boot) # Bastien (Y,T) PLS bootstrap aze_compl.bootYT <- bootplsglm(modplsglm, R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) confints.bootpls(aze_compl.bootYT) plots.confints.bootpls(confints.bootpls(aze_compl.bootYT)) # (Y,X) PLS bootstrap aze_compl.bootYX <- bootplsglm(modplsglm, R=250, verbose=FALSE, typeboot = "plsmodel") boxplots.bootpls(aze_compl.bootYX) confints.bootpls(aze_compl.bootYX) plots.confints.bootpls(confints.bootpls(aze_compl.bootYX)) # (Y,X) PLS bootstrap raw coefficients aze_compl.bootYX.raw <- bootplsglm(modplsglm, R=250, verbose=FALSE, typeboot = "plsmodel", statistic=coefs.plsRglm.raw) boxplots.bootpls(aze_compl.bootYX.raw) confints.bootpls(aze_compl.bootYX.raw) plots.confints.bootpls(confints.bootpls(aze_compl.bootYX.raw)) plot(aze_compl.bootYT,index=2) jack.after.boot(aze_compl.bootYT, index=2, useJ=TRUE, nt=3) plot(aze_compl.bootYT, index=2,jack=TRUE) aze_compl.tilt.boot <- tilt.bootplsglm(modplsglm, statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) # PLS bootstrap balanced aze_compl.bootYT <- bootplsglm(modplsglm, sim="balanced", R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) confints.bootpls(aze_compl.bootYT) plots.confints.bootpls(confints.bootpls(aze_compl.bootYT)) plot(aze_compl.bootYT) jack.after.boot(aze_compl.bootYT, index=1, useJ=TRUE, nt=3) plot(aze_compl.bootYT,jack=TRUE) aze_compl.tilt.boot <- tilt.bootplsglm(modplsglm, statistic=coefs.plsR, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="balanced", stype="i", index=1) # PLS permutation bootstrap aze_compl.bootYT <- bootplsglm(modplsglm, sim="permutation", R=250, verbose=FALSE) boxplots.bootpls(aze_compl.bootYT) plot(aze_compl.bootYT) #Original aze dataset with missing values data(aze) Xaze<-aze[,2:34] yaze<-aze$y library(boot) modplsglm2 <- plsRglm(yaze,Xaze,3,modele="pls-glm-logistic") aze.bootYT <- bootplsglm(modplsglm2, R=250, verbose=FALSE) boxplots.bootpls(aze.bootYT) confints.bootpls(aze.bootYT) plots.confints.bootpls(confints.bootpls(aze.bootYT)) #Ordinal logistic regression data(bordeaux) Xbordeaux<-bordeaux[,1:4] ybordeaux<-factor(bordeaux$Quality,ordered=TRUE) dataset <- cbind(y=ybordeaux,Xbordeaux) options(contrasts = c("contr.treatment", "contr.poly")) modplsglm3 <- plsRglm(ybordeaux,Xbordeaux,1,modele="pls-glm-polr") bordeaux.bootYT<- bootplsglm(modplsglm3, sim="permutation", R=250, verbose=FALSE) boxplots.bootpls(bordeaux.bootYT) boxplots.bootpls(bordeaux.bootYT,ranget0=TRUE) bordeaux.bootYT2<- bootplsglm(modplsglm3, sim="permutation", R=250, strata=unclass(ybordeaux), verbose=FALSE) boxplots.bootpls(bordeaux.bootYT2,ranget0=TRUE) if(require(chemometrics)){ data(hyptis) hyptis yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) dataset <- cbind(y=yhyptis,Xhyptis) options(contrasts = c("contr.treatment", "contr.poly")) modplsglm4 <- plsRglm(yhyptis,Xhyptis,3,modele="pls-glm-polr") hyptis.bootYT3<- bootplsglm(modplsglm4, sim="permutation", R=250, verbose=FALSE) rownames(hyptis.bootYT3$t0)<-c("Sabi\nnene","Pin\nene", "Cine\nole","Terpi\nnene","Fenc\nhone","Terpi\nnolene") boxplots.bootpls(hyptis.bootYT3) boxplots.bootpls(hyptis.bootYT3,xaxisticks=FALSE) boxplots.bootpls(hyptis.bootYT3,ranget0=TRUE) boxplots.bootpls(hyptis.bootYT3,ranget0=TRUE,xaxisticks=FALSE) }
Quality of Bordeaux wines (Quality) and four potentially predictive
variables (Temperature, Sunshine, Heat and
Rain).
A data frame with 34 observations on the following 5 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
an
ordered factor with levels 1 < 2 < 3
P. Bastien, V. Esposito-Vinzi, and M. Tenenhaus. (2005). PLS generalised linear regression. Computational Statistics & Data Analysis, 48(1):17-46.
M. Tenenhaus. (2005). La regression logistique PLS. In J.-J. Droesbeke, M. Lejeune, and G. Saporta, editors, Modeles statistiques pour donnees qualitatives. Editions Technip, Paris.
data(bordeaux) str(bordeaux)data(bordeaux) str(bordeaux)
Quality of Bordeaux wines (Quality) and four potentially predictive
variables (Temperature, Sunshine, Heat and
Rain).
A data frame with 34 observations on the following 5 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
an
ordered factor with levels 1 < 2 < 3
The value of x1 for the first observation was removed from the matrix of predictors on purpose.
The bordeauxNA is a dataset with a missing value for testing purpose.
P. Bastien, V. Esposito-Vinzi, and M. Tenenhaus. (2005). PLS generalised linear regression. Computational Statistics & Data Analysis, 48(1):17-46.
M. Tenenhaus. (2005). La regression logistique PLS. In J.-J. Droesbeke, M. Lejeune, and G. Saporta, editors, Modeles statistiques pour donnees qualitatives. Editions Technip, Paris.
data(bordeauxNA) str(bordeauxNA)data(bordeauxNA) str(bordeauxNA)
Boxplots for bootstrap distributions.
boxplots.bootpls( bootobject, indices = NULL, prednames = TRUE, articlestyle = TRUE, xaxisticks = TRUE, ranget0 = FALSE, las = par("las"), mar, mgp, ... )boxplots.bootpls( bootobject, indices = NULL, prednames = TRUE, articlestyle = TRUE, xaxisticks = TRUE, ranget0 = FALSE, las = par("las"), mar, mgp, ... )
bootobject |
a object of class |
indices |
vector of indices of the variables to plot. Defaults to
|
prednames |
do the original names of the predictors shall be plotted ?
Defaults to |
articlestyle |
do the extra blank zones of the margin shall be removed
from the plot ? Defaults to |
xaxisticks |
do ticks for the x axis shall be plotted ? Defaults to
|
ranget0 |
does the vertival range of the plot shall be computed to
include the initial estimates of the coefficients ? Defaults to
|
las |
numeric in 0,1,2,3; the style of axis labels. 0: always parallel to the axis [default], 1: always horizontal, 2: always perpendicular to the axis, 3: always vertical. |
mar |
A numerical vector of the form |
mgp |
The margin line (in mex units) for the axis title, axis labels
and axis line. Note that |
... |
further options to pass to the
|
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS ordinary bootstrap set.seed(250) modpls <- plsR(yCornell,XCornell,3) Cornell.bootYX <- bootpls(modpls, R=250) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplots.bootpls(Cornell.bootYX,indices=2:8) data(aze_compl) modplsglm<-plsRglm(y~.,data=aze_compl,3,modele="pls-glm-logistic") aze_compl.boot3 <- bootplsglm(modplsglm, R=250, verbose=FALSE) boxplots.bootpls(aze_compl.boot3) boxplots.bootpls(aze_compl.boot3,las=3,mar=c(5,2,1,1)) boxplots.bootpls(aze_compl.boot3,indices=c(2,4,6),prednames=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS ordinary bootstrap set.seed(250) modpls <- plsR(yCornell,XCornell,3) Cornell.bootYX <- bootpls(modpls, R=250) # Graph similar to the one of Bastien et al. in CSDA 2005 boxplots.bootpls(Cornell.bootYX,indices=2:8) data(aze_compl) modplsglm<-plsRglm(y~.,data=aze_compl,3,modele="pls-glm-logistic") aze_compl.boot3 <- bootplsglm(modplsglm, R=250, verbose=FALSE) boxplots.bootpls(aze_compl.boot3) boxplots.bootpls(aze_compl.boot3,las=3,mar=c(5,2,1,1)) boxplots.bootpls(aze_compl.boot3,indices=c(2,4,6),prednames=FALSE)
This helper plots the individuals and predictors from a fitted
plsR or plsRglm model while coloring the
individuals according to a grouping variable.
classbiplot( object, group = NULL, comps = 1:2, col, colvar = "gray30", pch = 19, cex = rep(par("cex"), 2), xlabs = NULL, ylabs = NULL, point.labels = FALSE, show.legend = TRUE, legendpos = "topright", var.axes = TRUE, expand = 1, xlim = NULL, ylim = NULL, arrow.len = 0.1, main = NULL, sub = NULL, xlab = NULL, ylab = NULL, ... )classbiplot( object, group = NULL, comps = 1:2, col, colvar = "gray30", pch = 19, cex = rep(par("cex"), 2), xlabs = NULL, ylabs = NULL, point.labels = FALSE, show.legend = TRUE, legendpos = "topright", var.axes = TRUE, expand = 1, xlim = NULL, ylim = NULL, arrow.len = 0.1, main = NULL, sub = NULL, xlab = NULL, ylab = NULL, ... )
object |
an object containing score and loading matrices in
|
group |
optional grouping vector for the individuals. When supplied,
observations are colored according to the levels of |
comps |
integer vector of length 2 giving the components to display. |
col |
colors for the individuals. If |
colvar |
color used for variable labels, arrows and axes. |
pch |
plotting character for the individuals. |
cex |
character expansion. Length 1 is recycled to length 2: the first value is used for the individuals and the second one for the variables. |
xlabs |
optional labels for the individuals. |
ylabs |
optional labels for the variables. |
point.labels |
shall the individuals be displayed using text labels
instead of points? Defaults to |
show.legend |
shall a legend be added when |
legendpos |
position of the legend as in
|
var.axes |
shall arrows be drawn for the variables? Defaults to
|
expand |
expansion factor for the variables layer, as in
|
xlim, ylim
|
limits for the scores panel. When both are missing they are
chosen symmetrically as in |
arrow.len |
length of the arrows for the variables. |
main, sub, xlab, ylab
|
usual graphical parameters passed to
|
... |
further graphical parameters passed to
|
Invisibly returns a list with the scores, loadings, colors, grouping factor and scaling ratio used in the plot.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(Cornell) modpls <- plsR(Y ~ ., data = Cornell, nt = 2) grp <- factor(Cornell$Y > median(Cornell$Y), labels = c("Low", "High")) classbiplot(modpls, group = grp, col = c("firebrick3", "steelblue3"))data(Cornell) modpls <- plsR(Y ~ ., data = Cornell, nt = 2) grp <- factor(Cornell$Y > median(Cornell$Y), labels = c("Low", "High")) classbiplot(modpls, group = grp, col = c("firebrick3", "steelblue3"))
This function provides a coef method for the class "plsRglmmodel"
## S3 method for class 'plsRglmmodel' coef(object, type = c("scaled", "original"), ...)## S3 method for class 'plsRglmmodel' coef(object, type = c("scaled", "original"), ...)
object |
an object of the class |
type |
if |
... |
not used |
An object of class coef.plsRglmmodel.
CoeffC |
Coefficients of the components. |
Std.Coeffs |
Coefficients of the scaled predictors in the regression function. |
Coeffs |
Coefficients of the untransformed predictors (on their original scale). |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls-glm-family",family=gaussian()) class(modpls) coef(modpls) coef(modpls,type="scaled") rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls-glm-family",family=gaussian()) class(modpls) coef(modpls) coef(modpls,type="scaled") rm(list=c("XCornell","yCornell","modpls"))
This function provides a coef method for the class "plsRmodel"
## S3 method for class 'plsRmodel' coef(object, type = c("scaled", "original"), ...)## S3 method for class 'plsRmodel' coef(object, type = c("scaled", "original"), ...)
object |
an object of the class |
type |
if |
... |
not used |
An object of class coef.plsRmodel.
CoeffC |
Coefficients of the components. |
Std.Coeffs |
Coefficients of the scaled predictors. |
Coeffs |
Coefficients of the untransformed predictors (on their original scale). |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) coef(modpls) coef(modpls,type="scaled") rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) coef(modpls) coef(modpls,type="scaled") rm(list=c("XCornell","yCornell","modpls"))
A function passed to boot to perform bootstrap.
coefs.plsR(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)coefs.plsR(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsR |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYX <- bootpls(modpls, R=250, statistic=coefs.plsR, verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYX <- bootpls(modpls, R=250, statistic=coefs.plsR, verbose=FALSE)
A function passed to boot to perform bootstrap.
coefs.plsR.raw(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)coefs.plsR.raw(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsR |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- coefs.plsR.raw(Cornell[,-8],1:nrow(Cornell),nt=3, maxcoefvalues=1e5,ifbootfail=rep(0,3),verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- coefs.plsR.raw(Cornell[,-8],1:nrow(Cornell),nt=3, maxcoefvalues=1e5,ifbootfail=rep(0,3),verbose=FALSE)
A function passed to boot to perform bootstrap.
coefs.plsRglm( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )coefs.plsRglm( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
fit_backend |
backend used for repeated non-ordinal score-space GLM
fits. Use |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm.
data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYX <- bootplsglm(modplsglm, R=250, typeboot="plsmodel", statistic=coefs.plsRglm, verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYX <- bootplsglm(modplsglm, R=250, typeboot="plsmodel", statistic=coefs.plsRglm, verbose=FALSE)
A function passed to boot to perform bootstrap.
coefs.plsRglm.raw( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )coefs.plsRglm.raw( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
fit_backend |
backend used for repeated non-ordinal score-space GLM
fits. Use |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm.
data(Cornell) # (Y,X) bootstrap of a PLSGLR model set.seed(250) modplsglm <- coefs.plsRglm.raw(Cornell[,-8],1:nrow(Cornell),nt=3, modele="pls-glm-family",family=gaussian,maxcoefvalues=1e5, ifbootfail=rep(0,3),verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model set.seed(250) modplsglm <- coefs.plsRglm.raw(Cornell[,-8],1:nrow(Cornell),nt=3, modele="pls-glm-family",family=gaussian,maxcoefvalues=1e5, ifbootfail=rep(0,3),verbose=FALSE)
A function passed to boot to perform bootstrap.
coefs.plsRglmnp( dataRepYtt, ind, nt, modele, family = NULL, maxcoefvalues, wwetoile, ifbootfail )coefs.plsRglmnp( dataRepYtt, ind, nt, modele, family = NULL, maxcoefvalues, wwetoile, ifbootfail )
dataRepYtt |
components' coordinates to bootstrap |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
wwetoile |
values of the Wstar matrix in the original fit |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
~~some notes~~
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm
data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYT <- bootplsglm(modplsglm, R=250, statistic=coefs.plsRglmnp, verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYT <- bootplsglm(modplsglm, R=250, statistic=coefs.plsRglmnp, verbose=FALSE)
A function passed to boot to perform bootstrap.
coefs.plsRnp(dataRepYtt, ind, nt, modele, maxcoefvalues, wwetoile, ifbootfail)coefs.plsRnp(dataRepYtt, ind, nt, modele, maxcoefvalues, wwetoile, ifbootfail)
dataRepYtt |
components' coordinates to bootstrap |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
wwetoile |
values of the Wstar matrix in the original fit |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYT <- bootpls(modpls, R=250, typeboot="fmodel_np", statistic=coefs.plsRnp, verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYT <- bootpls(modpls, R=250, typeboot="fmodel_np", statistic=coefs.plsRnp, verbose=FALSE)
This function is a wrapper for boot.ci to derive
bootstrap-based confidence intervals from a "boot" object.
confints.bootpls(bootobject, indices = NULL, typeBCa = TRUE)confints.bootpls(bootobject, indices = NULL, typeBCa = TRUE)
bootobject |
an object of class |
indices |
the indices of the predictor for which CIs should be
calculated. Defaults to |
typeBCa |
shall BCa bootstrap based CI derived ? Defaults to
|
Matrix with the limits of bootstrap based CI for all (defaults) or
only the selected predictors (indices option). The limits are given
in that order: Normal Lower then Upper Limit, Basic Lower then Upper Limit,
Percentile Lower then Upper Limit, BCa Lower then Upper Limit.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls and bootplsglm.
data(Cornell) #Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- plsR(Y~.,data=Cornell,3) Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) confints.bootpls(Cornell.bootYX,2:8) confints.bootpls(Cornell.bootYX,2:8,typeBCa=FALSE)data(Cornell) #Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- plsR(Y~.,data=Cornell,3) Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) confints.bootpls(Cornell.bootYX,2:8) confints.bootpls(Cornell.bootYX,2:8,typeBCa=FALSE)
A correlation matrix to simulate datasets
A data frame with 17 observations on the following 17 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Handmade.
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(CorMat) str(CorMat)data(CorMat) str(CorMat)
The famous Cornell dataset. A mixture experiment on X1, X2,
X3, X4, X5, X6 and X7 to analyse octane
degree (Y) in gazoline.
A data frame with 12 observations on the following 8 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
response value: a numeric vector
M. Tenenhaus. (1998). La regression PLS, Theorie et pratique. Editions Technip, Paris.
N. Kettaneh-Wold. Analysis of mixture data with partial least squares. (1992). Chemometrics and Intelligent Laboratory Systems, 14(1):57-69.
data(Cornell) str(Cornell)data(Cornell) str(Cornell)
This function implements k-fold cross-validation on complete or incomplete datasets for partial least squares regression models
cv.plsR(object, ...) ## Default S3 method: cv.plsRmodel(object,dataX,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights, verbose=TRUE,...) ## S3 method for class 'formula' cv.plsRmodel(object,data=NULL,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights,subset,contrasts=NULL, verbose=TRUE,...) PLS_lm_kfoldcv(dataY, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed=FALSE, tol_Xi = 10^(-12), weights, verbose=TRUE) PLS_lm_kfoldcv_formula(formula,data=NULL,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights,subset,contrasts=NULL,verbose=TRUE)cv.plsR(object, ...) ## Default S3 method: cv.plsRmodel(object,dataX,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights, verbose=TRUE,...) ## S3 method for class 'formula' cv.plsRmodel(object,data=NULL,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights,subset,contrasts=NULL, verbose=TRUE,...) PLS_lm_kfoldcv(dataY, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed=FALSE, tol_Xi = 10^(-12), weights, verbose=TRUE) PLS_lm_kfoldcv_formula(formula,data=NULL,nt=2,limQ2set=.0975,modele="pls", K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights,subset,contrasts=NULL,verbose=TRUE)
object |
response (training) dataset or an object of class " |
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
formula |
an object of class " |
data |
an optional data frame, list or environment (or object coercible by |
nt |
number of components to be extracted |
limQ2set |
limit value for the Q2 |
modele |
name of the PLS model to be fitted, only ( |
K |
number of groups. Defaults to 5. |
NK |
number of times the group division is made |
grouplist |
to specify the members of the |
random |
should the |
scaleX |
scale the predictor(s) : must be set to TRUE for |
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
keepcoeffs |
shall the coefficients for each model be returned |
keepfolds |
shall the groups' composition be returned |
keepdataY |
shall the observed value of the response for each one of the predicted value be returned |
keepMclassed |
shall the number of miss classed be returned |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
contrasts |
an optional list. See the |
verbose |
should info messages be displayed ? |
... |
arguments to pass to |
Predicts 1 group with the K-1 other groups. Leave one out cross validation is thus obtained for K==nrow(dataX).
A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A specification of the form first:second indicates the the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.
The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
An object of class "cv.plsRmodel".
results_kfolds |
list of
|
folds |
list of
|
dataY_kfolds |
list of
|
call |
the call of the function |
Work for complete and incomplete datasets.
Frederic Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frederic Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
Summary method summary.cv.plsRmodel. kfolds2coeff, kfolds2Pressind, kfolds2Press, kfolds2Mclassedind, kfolds2Mclassed and kfolds2CVinfos_lm to extract and transform results from k-fold cross-validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] #Leave one out CV (K=nrow(Cornell)) one time (NK=1) bbb <- cv.plsR(object=yCornell,dataX=XCornell,nt=6,K=nrow(Cornell),NK=1) bbb2 <- cv.plsR(Y~.,data=Cornell,nt=6,K=12,NK=1,verbose=FALSE) (sum1<-summary(bbb2)) #6-fold CV (K=6) two times (NK=2) #use random=TRUE to randomly create folds for repeated CV bbb3 <- cv.plsR(object=yCornell,dataX=XCornell,nt=6,K=6,NK=2) bbb4 <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=2,verbose=FALSE) (sum3<-summary(bbb4)) cvtable(sum1) cvtable(sum3) rm(list=c("XCornell","yCornell","bbb","bbb2","bbb3","bbb4"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] #Leave one out CV (K=nrow(Cornell)) one time (NK=1) bbb <- cv.plsR(object=yCornell,dataX=XCornell,nt=6,K=nrow(Cornell),NK=1) bbb2 <- cv.plsR(Y~.,data=Cornell,nt=6,K=12,NK=1,verbose=FALSE) (sum1<-summary(bbb2)) #6-fold CV (K=6) two times (NK=2) #use random=TRUE to randomly create folds for repeated CV bbb3 <- cv.plsR(object=yCornell,dataX=XCornell,nt=6,K=6,NK=2) bbb4 <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=2,verbose=FALSE) (sum3<-summary(bbb4)) cvtable(sum1) cvtable(sum3) rm(list=c("XCornell","yCornell","bbb","bbb2","bbb3","bbb4"))
This function implements k-fold cross-validation on complete or incomplete datasets for partial least squares regression generalized linear models
cv.plsRglm(object, ...) ## Default S3 method: cv.plsRglmmodel(object,dataX,nt=2,limQ2set=.0975, modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights, method, fit_backend="stats",verbose=TRUE,...) ## S3 method for class 'formula' cv.plsRglmmodel(object,data=NULL,nt=2,limQ2set=.0975, modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12),weights,subset, start=NULL,etastart,mustart,offset,method,control= list(),contrasts=NULL, fit_backend="stats",verbose=TRUE,...) PLS_glm_kfoldcv(dataY, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed=FALSE, tol_Xi = 10^(-12), weights, method, fit_backend="stats",verbose=TRUE) PLS_glm_kfoldcv_formula(formula,data=NULL,nt=2,limQ2set=.0975,modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12),weights,subset,start=NULL,etastart, mustart,offset,method,control= list(),contrasts=NULL, fit_backend="stats", verbose=TRUE)cv.plsRglm(object, ...) ## Default S3 method: cv.plsRglmmodel(object,dataX,nt=2,limQ2set=.0975, modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12), weights, method, fit_backend="stats",verbose=TRUE,...) ## S3 method for class 'formula' cv.plsRglmmodel(object,data=NULL,nt=2,limQ2set=.0975, modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12),weights,subset, start=NULL,etastart,mustart,offset,method,control= list(),contrasts=NULL, fit_backend="stats",verbose=TRUE,...) PLS_glm_kfoldcv(dataY, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed=FALSE, tol_Xi = 10^(-12), weights, method, fit_backend="stats",verbose=TRUE) PLS_glm_kfoldcv_formula(formula,data=NULL,nt=2,limQ2set=.0975,modele="pls", family=NULL, K=5, NK=1, grouplist=NULL, random=TRUE, scaleX=TRUE, scaleY=NULL, keepcoeffs=FALSE, keepfolds=FALSE, keepdataY=TRUE, keepMclassed=FALSE, tol_Xi=10^(-12),weights,subset,start=NULL,etastart, mustart,offset,method,control= list(),contrasts=NULL, fit_backend="stats", verbose=TRUE)
object |
response (training) dataset or an object of class " |
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
formula |
an object of class " |
data |
an optional data frame, list or environment (or object coercible by |
nt |
number of components to be extracted |
limQ2set |
limit value for the Q2 |
modele |
name of the PLS glm model to be fitted ( |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See |
K |
number of groups. Defaults to 5. |
NK |
number of times the group division is made |
grouplist |
to specify the members of the |
random |
should the |
scaleX |
scale the predictor(s) : must be set to TRUE for |
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
keepcoeffs |
shall the coefficients for each model be returned |
keepfolds |
shall the groups' composition be returned |
keepdataY |
shall the observed value of the response for each one of the predicted value be returned |
keepMclassed |
shall the number of miss classed be returned (unavailable) |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be |
fit_backend |
backend used for repeated non-ordinal score-space GLM fits during cross-validation. Use |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be |
method |
For non-ordinal GLM modes this argument is kept for backward compatibility; use |
control |
a list of parameters for controlling the fitting process. For |
contrasts |
an optional list. See the |
verbose |
should info messages be displayed ? |
... |
arguments to pass to |
Predicts 1 group with the K-1 other groups. Leave one out cross validation is thus obtained for K==nrow(dataX).
There are seven different predefined models with predefined link functions available :
"pls"ordinary pls models
"pls-glm-Gamma"glm gaussian with inverse link pls models
"pls-glm-gaussian"glm gaussian with identity link pls models
"pls-glm-inverse-gamma"glm binomial with square inverse link pls models
"pls-glm-logistic"glm binomial with logit link pls models
"pls-glm-poisson"glm poisson with log link pls models
"pls-glm-polr"glm polr with logit link pls models
Using the "family=" option and setting "modele=pls-glm-family" allows changing the family and link function the same way as for the glm function. As a consequence user-specified families can also be used.
gaussian familyaccepts the links (as names) identity, log and inverse.
binomial familyaccepts the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log).
Gamma familyaccepts the links inverse, identity and log.
poisson familyaccepts the links log, identity, and sqrt.
inverse.gaussian familyaccepts the links 1/mu^2, inverse, identity and log.
quasi familyaccepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt.
power
can be used to create a power link function.
arguments to pass to cv.plsRglmmodel.default or to cv.plsRglmmodel.formula
A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A specification of the form first:second indicates the the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.
The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
An object of class "cv.plsRglmmodel".
results_kfolds |
list of
|
folds |
list of
|
dataY_kfolds |
list of
|
fit_backend |
backend used for repeated non-ordinal score-space GLM fits during cross-validation |
call |
the call of the function |
Work for complete and incomplete datasets.
Frederic Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frederic Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18.
Summary method summary.cv.plsRglmmodel. kfolds2coeff, kfolds2Pressind, kfolds2Press, kfolds2Mclassedind, kfolds2Mclassed and summary to extract and transform results from k-fold cross validation.
data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10) (sum1<-summary(bbb)) cvtable(sum1) bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-family",family=gaussian(),K=12,verbose=FALSE) (sum2<-summary(bbb2)) cvtable(sum2) #random=TRUE is the default to randomly create folds for repeated CV bbb3 <- cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-family",family=gaussian(),K=6,NK=10, verbose=FALSE) (sum3<-summary(bbb3)) plot(cvtable(sum3)) data(aze_compl) bbb <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls",keepcoeffs=TRUE, verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls-glm-family", family=binomial(probit),keepcoeffs=TRUE, verbose=FALSE) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10, modele="pls-glm-logistic",keepcoeffs=TRUE, verbose=FALSE) summary(bbb,MClassed=TRUE) summary(bbb2,MClassed=TRUE) kfolds2coeff(bbb2) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) rm(list=c("bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10,modele="pls-glm-family", family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm(ypine,Xpine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) data(XpineNAX21) PLS_lm(ypine,XpineNAX21,10,typeVC="standard")$InfCrit rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=Gamma,K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-Gamma", K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm(ypine,Xpine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=Gamma(),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-Gamma",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA PLS_lm(ypine,XpineNAX21,10,typeVC="standard")$InfCrit rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls",verbose=FALSE) summary(bbb) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=12,verbose=FALSE) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=inverse.gaussian,K=12,verbose=FALSE) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family",family=inverse.gaussian(), K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(link = "1/mu^2"),K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-inverse.gaussian",keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm(yCornell,XCornell,10,typeVC="standard")$InfCrit rm(list=c("XCornell","yCornell","bbb","bbb2")) data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls") summary(bbb) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=12) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=6, NK=2,random=TRUE,keepfolds=TRUE,verbose=FALSE)$results_kfolds #Different ways of model specifications cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian, K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(), K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(link=log), K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-gaussian",keepcoeffs=TRUE,verbose=FALSE) bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=gaussian(link=log),K=6,keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(Y~.,data=Cornell,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=gaussian(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family",family=gaussian(), K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,nt=10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=gaussian(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-gaussian",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,nt=10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(aze_compl) bbb <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls", keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=3,K=10, modele="pls-glm-family",family=binomial(probit),keepcoeffs=TRUE,verbose=FALSE) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=3,K=10, modele="pls-glm-logistic",keepcoeffs=TRUE,verbose=FALSE) summary(bbb,MClassed=TRUE) summary(bbb2,MClassed=TRUE) kfolds2coeff(bbb2) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=Gamma,K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-Gamma", K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=Gamma(),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-Gamma",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(Cornell) summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls",verbose=FALSE)) cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-inverse.gaussian",K=12,verbose=FALSE) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=inverse.gaussian,K=12,verbose=FALSE) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(),K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(link = "1/mu^2"),K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-inverse.gaussian",keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(Y~.,data=Cornell,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(bordeaux) summary(cv.plsRglm(Quality~.,data=bordeaux,10, modele="pls-glm-polr",K=7)) data(bordeauxNA) summary(cv.plsRglm(Quality~.,data=bordeauxNA, 10,modele="pls-glm-polr",K=10,verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="logistic",verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="probit",verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="cloglog",verbose=FALSE)) suppressWarnings(summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="cauchit",verbose=FALSE)))data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10) (sum1<-summary(bbb)) cvtable(sum1) bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-family",family=gaussian(),K=12,verbose=FALSE) (sum2<-summary(bbb2)) cvtable(sum2) #random=TRUE is the default to randomly create folds for repeated CV bbb3 <- cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-family",family=gaussian(),K=6,NK=10, verbose=FALSE) (sum3<-summary(bbb3)) plot(cvtable(sum3)) data(aze_compl) bbb <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls",keepcoeffs=TRUE, verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls-glm-family", family=binomial(probit),keepcoeffs=TRUE, verbose=FALSE) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10, modele="pls-glm-logistic",keepcoeffs=TRUE, verbose=FALSE) summary(bbb,MClassed=TRUE) summary(bbb2,MClassed=TRUE) kfolds2coeff(bbb2) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) rm(list=c("bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10,modele="pls-glm-family", family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm(ypine,Xpine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) data(XpineNAX21) PLS_lm(ypine,XpineNAX21,10,typeVC="standard")$InfCrit rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=Gamma,K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-Gamma", K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm(ypine,Xpine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=Gamma(),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-Gamma",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA PLS_lm(ypine,XpineNAX21,10,typeVC="standard")$InfCrit rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls",verbose=FALSE) summary(bbb) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=12,verbose=FALSE) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=inverse.gaussian,K=12,verbose=FALSE) cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family",family=inverse.gaussian(), K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(link = "1/mu^2"),K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-inverse.gaussian",keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm(yCornell,XCornell,10,typeVC="standard")$InfCrit rm(list=c("XCornell","yCornell","bbb","bbb2")) data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls") summary(bbb) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=12) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=6, NK=2,random=TRUE,keepfolds=TRUE,verbose=FALSE)$results_kfolds #Different ways of model specifications cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(),K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian, K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(), K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=gaussian(link=log), K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-gaussian",keepcoeffs=TRUE,verbose=FALSE) bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=gaussian(link=log),K=6,keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(Y~.,data=Cornell,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=gaussian(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family",family=gaussian(), K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,nt=10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=gaussian(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-gaussian",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,nt=10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(aze_compl) bbb <- cv.plsRglm(y~.,data=aze_compl,nt=10,K=10,modele="pls", keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=3,K=10, modele="pls-glm-family",family=binomial(probit),keepcoeffs=TRUE,verbose=FALSE) bbb2 <- cv.plsRglm(y~.,data=aze_compl,nt=3,K=10, modele="pls-glm-logistic",keepcoeffs=TRUE,verbose=FALSE) summary(bbb,MClassed=TRUE) summary(bbb2,MClassed=TRUE) kfolds2coeff(bbb2) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(round(x11)~.,data=pine,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=poisson(log),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(round(x11)~.,data=pineNAX21,nt=10, modele="pls-glm-poisson",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(pine) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-family", family=Gamma,K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb <- cv.plsRglm(x11~.,data=pine,nt=10,modele="pls-glm-Gamma", K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,1]) kfolds2Chisqind(bbb) kfolds2Chisq(bbb) summary(bbb) PLS_lm_formula(x11~.,data=pine,10,typeVC="standard")$InfCrit data(pineNAX21) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-family",family=Gamma(),K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) bbb2 <- cv.plsRglm(x11~.,data=pineNAX21,nt=10, modele="pls-glm-Gamma",K=10,keepcoeffs=TRUE,keepfolds=FALSE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(x11~.,data=pineNAX21,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(Cornell) summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1,modele="pls",verbose=FALSE)) cv.plsRglm(Y~.,data=Cornell,nt=3, modele="pls-glm-inverse.gaussian",K=12,verbose=FALSE) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family",family=inverse.gaussian,K=12,verbose=FALSE) cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(),K=6,NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-inverse.gaussian",K=6, NK=2,verbose=FALSE)$results_kfolds cv.plsRglm(Y~.,data=Cornell,nt=3,modele="pls-glm-family", family=inverse.gaussian(link = "1/mu^2"),K=6,NK=2,verbose=FALSE)$results_kfolds bbb2 <- cv.plsRglm(Y~.,data=Cornell,nt=10, modele="pls-glm-inverse.gaussian",keepcoeffs=TRUE,verbose=FALSE) #For Jackknife computations kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) kfolds2Chisqind(bbb2) kfolds2Chisq(bbb2) summary(bbb2) PLS_lm_formula(Y~.,data=Cornell,10,typeVC="standard")$InfCrit rm(list=c("bbb","bbb2")) data(bordeaux) summary(cv.plsRglm(Quality~.,data=bordeaux,10, modele="pls-glm-polr",K=7)) data(bordeauxNA) summary(cv.plsRglm(Quality~.,data=bordeauxNA, 10,modele="pls-glm-polr",K=10,verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="logistic",verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="probit",verbose=FALSE)) summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="cloglog",verbose=FALSE)) suppressWarnings(summary(cv.plsRglm(Quality~.,data=bordeaux,nt=2,K=7, modele="pls-glm-polr",method="cauchit",verbose=FALSE)))
cv.plsRmulti() performs repeated k-fold cross-validation for the
experimental complete-case linear plsRmulti workflow.
cv.plsRmulti(object, ...) ## Default S3 method: cv.plsRmultiModel( object, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed = FALSE, EstimXNA = FALSE, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... ) ## S3 method for class 'formula' cv.plsRmultiModel( object, data = NULL, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed = FALSE, EstimXNA = FALSE, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights = NULL, subset = NULL, contrasts = NULL, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... )cv.plsRmulti(object, ...) ## Default S3 method: cv.plsRmultiModel( object, dataX, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed = FALSE, EstimXNA = FALSE, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... ) ## S3 method for class 'formula' cv.plsRmultiModel( object, data = NULL, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, K = 5, NK = 1, grouplist = NULL, random = TRUE, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepfolds = FALSE, keepdataY = TRUE, keepMclassed = FALSE, EstimXNA = FALSE, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights = NULL, subset = NULL, contrasts = NULL, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... )
object |
For the default method, a numeric multivariate response matrix
or data frame with at least two columns. For the formula method, a formula of
the form |
... |
Not used. Extra arguments are rejected in this experimental release. |
dataX |
Numeric predictor matrix or data frame. |
nt |
Number of components to extract in each fold fit. |
limQ2set |
Threshold used by |
modele |
Only |
family |
Not supported in this experimental release. |
K |
Number of groups for each partition. |
NK |
Number of repeated partitions. |
grouplist |
Optional user-supplied partitions. |
random |
Should the folds be generated randomly? |
scaleX |
Should predictors be scaled? |
scaleY |
Should responses be scaled? Defaults to |
keepcoeffs |
Should standardized coefficient vectors be stored for each fold fit? |
keepfolds |
Should training indices be stored for each fold fit? |
keepdataY |
Kept for interface compatibility. Observed fold responses are stored so that summaries can be computed. |
keepMclassed |
Not supported in this experimental release. |
EstimXNA |
Not supported in this experimental release. |
pvals.expli |
Not supported in this experimental release. |
alpha.pvals.expli |
Not supported in this experimental release. |
MClassed |
Not supported in this experimental release. |
tol_Xi |
Tolerance used for degeneracy checks during component extraction. |
weights |
Not supported in this experimental release. |
sparse |
Not supported in this experimental release. |
sparseStop |
Not supported in this experimental release. |
naive |
Not supported in this experimental release. |
verbose |
Should informational messages be displayed? |
data |
An optional data frame for the formula method. |
subset |
An optional subset for the formula method. |
contrasts |
Optional contrasts for the formula method. |
Only the linear multivariate-response PLS2 mode is supported here. Missing values, weights, sparse extraction options, classification diagnostics, and GLM families remain out of scope for this experimental API.
An object of class "cv.plsRmultiModel" with repeated fold predictions,
observed fold responses, optional coefficient vectors and fold indices, and the
reference full-data "plsRmultiModel" fit used for aggregated summary
metrics.
plsRmulti, summary.cv.plsRmultiModel,
cvtable, bootpls
set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) cv_fit <- cv.plsRmulti(Y, X, nt = 2, K = 3, NK = 1, verbose = FALSE) summary(cv_fit, verbose = FALSE)set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) cv_fit <- cv.plsRmulti(Y, X, nt = 2, K = 3, NK = 1, verbose = FALSE) summary(cv_fit, verbose = FALSE)
The function cvtable is wrapper of cvtable.plsR and
cvtable.plsRglm that provides a table summary for the classes
"summary.cv.plsRmodel" and "summary.cv.plsRglmmodel"
cvtable(x, verbose = TRUE, ...)cvtable(x, verbose = TRUE, ...)
x |
an object of the class |
verbose |
should results be displayed ? |
... |
further arguments to be passed to or from methods. |
listList of Information Criteria computed for each fold.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=5) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=25,verbose=FALSE) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=100,verbose=FALSE) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modplsglm <- cv.plsRglm(Y~.,data=Cornell,nt=6,K=6, modele="pls-glm-gaussian",NK=100,verbose=FALSE) res.cv.modplsglm <- cvtable(summary(cv.modplsglm)) plot(res.cv.modplsglm) #defaults to type="CVQ2Chi2" rm(list=c("res.cv.modplsglm"))data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=5) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=25,verbose=FALSE) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modpls <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=100,verbose=FALSE) res.cv.modpls <- cvtable(summary(cv.modpls)) plot(res.cv.modpls) #defaults to type="CVQ2" rm(list=c("cv.modpls","res.cv.modpls")) data(Cornell) cv.modplsglm <- cv.plsRglm(Y~.,data=Cornell,nt=6,K=6, modele="pls-glm-gaussian",NK=100,verbose=FALSE) res.cv.modplsglm <- cvtable(summary(cv.modplsglm)) plot(res.cv.modplsglm) #defaults to type="CVQ2Chi2" rm(list=c("res.cv.modplsglm"))
This function takes a real value and converts it to 1 if it is positive and else to 0.
dicho(val)dicho(val)
val |
A real value |
0 or 1.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
dimX <- 6 Astar <- 4 (dataAstar4 <- t(replicate(10,simul_data_YX(dimX,Astar)))) dicho(dataAstar4) rm(list=c("dimX","Astar"))dimX <- 6 Astar <- 4 (dataAstar4 <- t(replicate(10,simul_data_YX(dimX,Astar)))) dicho(dataAstar4) rm(list=c("dimX","Astar"))
A classic dataset from Fowlkes.
A data frame with 9949 observations on the following 13 variables.
binary response
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
data(fowlkes) str(fowlkes)data(fowlkes) str(fowlkes)
This function computes information criteria for existing plsR model using Degrees of Freedom estimation.
infcrit.dof(modplsR, naive = FALSE)infcrit.dof(modplsR, naive = FALSE)
modplsR |
A plsR model i.e. an object returned by one of the functions
|
naive |
A boolean. |
If naive=FALSE returns AIC, BIC and gmdl values for estimated and
naive degrees of freedom. If naive=TRUE returns NULL.
matrix |
AIC, BIC and gmdl values or |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
M. Hansen, B. Yu. (2001). Model Selection and Minimum Descripion
Length Principle, Journal of the American Statistical Association,
96, 746-774.
N. Kraemer, M. Sugiyama. (2011). The Degrees of Freedom of
Partial Least Squares Regression. Journal of the American Statistical
Association, 106(494), 697-705.
N. Kraemer, M. Sugiyama, M.L. Braun.
(2009). Lanczos Approximations for the Speedup of Kernel Partial Least
Squares Regression, Proceedings of the Twelfth International
Conference on Artificial Intelligence and Statistics (AISTATS), 272-279.
plsR.dof for degrees of freedom computation and
infcrit.dof for computing information criteria directly from a
previously fitted plsR model.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) infcrit.dof(modpls)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) infcrit.dof(modpls)
This function computes Predicted Chisquare for k-fold cross validated partial least squares regression models.
kfolds2Chisq(pls_kfolds)kfolds2Chisq(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression glm model |
list |
Total Predicted Chisquare vs number of components for the first group partition |
list() |
... |
list |
Total Predicted Chisquare vs number of components for the last group partition |
Use cv.plsRglm to create k-fold cross validated partial
least squares regression glm models.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Press,
kfolds2Pressind, kfolds2Chisqind,
kfolds2Mclassedind and kfolds2Mclassed to
extract and transforms results from k-fold cross validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=16,verbose=FALSE) bbb2 <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=5,verbose=FALSE) kfolds2Chisq(bbb) kfolds2Chisq(bbb2) rm(list=c("XCornell","yCornell","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(object=ypine,dataX=Xpine,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbb2 <- cv.plsRglm(object=ypine,dataX=Xpine,nt=10,modele="pls-glm-gaussian",K=10,verbose=FALSE) kfolds2Chisq(bbb) kfolds2Chisq(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA bbbNA <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls",K=10,verbose=FALSE) kfolds2Press(bbbNA) kfolds2Chisq(bbbNA) bbbNA2 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbbNA3 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian",K=10, verbose=FALSE) kfolds2Chisq(bbbNA2) kfolds2Chisq(bbbNA3) rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2","bbbNA","bbbNA2","bbbNA3")) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-family", family="binomial",verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-logistic", verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-family", family=binomial(),K=10,verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-logistic", K=10,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=16,verbose=FALSE) bbb2 <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=5,verbose=FALSE) kfolds2Chisq(bbb) kfolds2Chisq(bbb2) rm(list=c("XCornell","yCornell","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(object=ypine,dataX=Xpine,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbb2 <- cv.plsRglm(object=ypine,dataX=Xpine,nt=10,modele="pls-glm-gaussian",K=10,verbose=FALSE) kfolds2Chisq(bbb) kfolds2Chisq(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA bbbNA <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls",K=10,verbose=FALSE) kfolds2Press(bbbNA) kfolds2Chisq(bbbNA) bbbNA2 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbbNA3 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian",K=10, verbose=FALSE) kfolds2Chisq(bbbNA2) kfolds2Chisq(bbbNA3) rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2","bbbNA","bbbNA2","bbbNA3")) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-family", family="binomial",verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-logistic", verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-family", family=binomial(),K=10,verbose=FALSE)) kfolds2Chisq(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-logistic", K=10,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))
This function computes individual Predicted Chisquare for k-fold cross validated partial least squares regression models.
kfolds2Chisqind(pls_kfolds)kfolds2Chisqind(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression glm model |
list |
Individual PChisq vs number of components for the first group partition |
list() |
... |
list |
Individual PChisq vs number of components for the last group partition |
Use cv.plsRglm to create k-fold cross validated partial
least squares regression glm models.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Press,
kfolds2Pressind, kfolds2Chisq,
kfolds2Mclassedind and kfolds2Mclassed to
extract and transforms results from k-fold cross-validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=16,verbose=FALSE) bbb2 <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=5,verbose=FALSE) kfolds2Chisqind(bbb) kfolds2Chisqind(bbb2) rm(list=c("XCornell","yCornell","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(object=ypine,dataX=Xpine,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbb2 <- cv.plsRglm(object=ypine,dataX=Xpine,nt=10,modele="pls-glm-gaussian",K=10,verbose=FALSE) kfolds2Chisqind(bbb) kfolds2Chisqind(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA bbbNA <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls",K=10,verbose=FALSE) kfolds2Pressind(bbbNA) kfolds2Chisqind(bbbNA) bbbNA2 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbbNA3 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian", K=10,verbose=FALSE) kfolds2Chisqind(bbbNA2) kfolds2Chisqind(bbbNA3) rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2","bbbNA","bbbNA2","bbbNA3")) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-family", family=binomial(),verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-logistic", verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-family", family=binomial(),K=10,verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10, modele="pls-glm-logistic",K=10,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=16,verbose=FALSE) bbb2 <- cv.plsRglm(object=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian",K=5,verbose=FALSE) kfolds2Chisqind(bbb) kfolds2Chisqind(bbb2) rm(list=c("XCornell","yCornell","bbb","bbb2")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb <- cv.plsRglm(object=ypine,dataX=Xpine,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbb2 <- cv.plsRglm(object=ypine,dataX=Xpine,nt=10,modele="pls-glm-gaussian",K=10,verbose=FALSE) kfolds2Chisqind(bbb) kfolds2Chisqind(bbb2) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA bbbNA <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls",K=10,verbose=FALSE) kfolds2Pressind(bbbNA) kfolds2Chisqind(bbbNA) bbbNA2 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=4,modele="pls-glm-gaussian",verbose=FALSE) bbbNA3 <- cv.plsRglm(object=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian", K=10,verbose=FALSE) kfolds2Chisqind(bbbNA2) kfolds2Chisqind(bbbNA3) rm(list=c("Xpine","XpineNAX21","ypine","bbb","bbb2","bbbNA","bbbNA2","bbbNA3")) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-family", family=binomial(),verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=4,modele="pls-glm-logistic", verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10,modele="pls-glm-family", family=binomial(),K=10,verbose=FALSE)) kfolds2Chisqind(cv.plsRglm(object=yaze_compl,dataX=Xaze_compl,nt=10, modele="pls-glm-logistic",K=10,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))
This fonction extracts coefficients from k-fold cross validated partial least squares regression models
kfolds2coeff(pls_kfolds)kfolds2coeff(pls_kfolds)
pls_kfolds |
an object that is a k-fold cross validated partial least squares regression models either lm or glm |
This fonctions works for plsR and plsRglm models.
coef.all |
matrix with the values of the coefficients for each
leave one out step or |
Only for NK=1 and leave one out CV
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2Pressind, kfolds2Press,
kfolds2Mclassedind, kfolds2Mclassed and
summary to extract and transform
results from k-fold cross validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- PLS_lm_kfoldcv(dataY=yCornell,dataX=XCornell,nt=3,K=nrow(XCornell),keepcoeffs=TRUE, verbose=FALSE) kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,2]) rm(list=c("XCornell","yCornell","bbb")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb2 <- cv.plsR(object=ypine,dataX=Xpine,nt=4,K=nrow(Xpine),keepcoeffs=TRUE,verbose=FALSE) kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) rm(list=c("Xpine","ypine","bbb2"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] bbb <- PLS_lm_kfoldcv(dataY=yCornell,dataX=XCornell,nt=3,K=nrow(XCornell),keepcoeffs=TRUE, verbose=FALSE) kfolds2coeff(bbb) boxplot(kfolds2coeff(bbb)[,2]) rm(list=c("XCornell","yCornell","bbb")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] bbb2 <- cv.plsR(object=ypine,dataX=Xpine,nt=4,K=nrow(Xpine),keepcoeffs=TRUE,verbose=FALSE) kfolds2coeff(bbb2) boxplot(kfolds2coeff(bbb2)[,1]) rm(list=c("Xpine","ypine","bbb2"))
This function extracts and computes information criteria and fits statistics for k-fold cross validated partial least squares glm models for both formula or classic specifications of the model.
kfolds2CVinfos_glm(pls_kfolds, MClassed = FALSE, verbose = TRUE)kfolds2CVinfos_glm(pls_kfolds, MClassed = FALSE, verbose = TRUE)
pls_kfolds |
an object computed using |
MClassed |
should number of miss classed be computed ? |
verbose |
should infos be displayed ? |
The Mclassed option should only set to TRUE if the response is
binary.
list |
table of fit statistics for first group partition |
list() |
... |
list |
table of fit statistics for last group partition |
Use summary and cv.plsRglm instead.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Pressind,
kfolds2Press, kfolds2Mclassedind and
kfolds2Mclassed to extract and transforms results from k-fold
cross-validation.
data(Cornell) summary(cv.plsRglm(Y~.,data=Cornell, nt=6,K=12,NK=1,keepfolds=FALSE,keepdataY=TRUE,modele="pls",verbose=FALSE),MClassed=TRUE) data(aze_compl) summary(cv.plsR(y~.,data=aze_compl,nt=10,K=8,modele="pls",verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8,modele="pls",verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-family", family=gaussian(),verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-logistic", verbose=FALSE),MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-family", family=binomial(),verbose=FALSE), MClassed=TRUE,verbose=FALSE) if(require(chemometrics)){ data(hyptis) hyptis yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) options(contrasts = c("contr.treatment", "contr.poly")) modpls2 <- plsRglm(yhyptis,Xhyptis,6,modele="pls-glm-polr") modpls2$Coeffsmodel_vals modpls2$InfCrit modpls2$Coeffs modpls2$std.coeffs table(yhyptis,predict(modpls2$FinalModel,type="class")) modpls3 <- PLS_glm(yhyptis[-c(1,2,3)],Xhyptis[-c(1,2,3),],3,modele="pls-glm-polr", dataPredictY=Xhyptis[c(1,2,3),],verbose=FALSE) summary(cv.plsRglm(factor(Group,ordered=TRUE)~.,data=hyptis[,-c(7,8)],nt=4,K=10, random=TRUE,modele="pls-glm-polr",keepcoeffs=TRUE,verbose=FALSE), MClassed=TRUE,verbose=FALSE) }data(Cornell) summary(cv.plsRglm(Y~.,data=Cornell, nt=6,K=12,NK=1,keepfolds=FALSE,keepdataY=TRUE,modele="pls",verbose=FALSE),MClassed=TRUE) data(aze_compl) summary(cv.plsR(y~.,data=aze_compl,nt=10,K=8,modele="pls",verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8,modele="pls",verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-family", family=gaussian(),verbose=FALSE), MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-logistic", verbose=FALSE),MClassed=TRUE,verbose=FALSE) summary(cv.plsRglm(y~.,data=aze_compl,nt=10,K=8, modele="pls-glm-family", family=binomial(),verbose=FALSE), MClassed=TRUE,verbose=FALSE) if(require(chemometrics)){ data(hyptis) hyptis yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) options(contrasts = c("contr.treatment", "contr.poly")) modpls2 <- plsRglm(yhyptis,Xhyptis,6,modele="pls-glm-polr") modpls2$Coeffsmodel_vals modpls2$InfCrit modpls2$Coeffs modpls2$std.coeffs table(yhyptis,predict(modpls2$FinalModel,type="class")) modpls3 <- PLS_glm(yhyptis[-c(1,2,3)],Xhyptis[-c(1,2,3),],3,modele="pls-glm-polr", dataPredictY=Xhyptis[c(1,2,3),],verbose=FALSE) summary(cv.plsRglm(factor(Group,ordered=TRUE)~.,data=hyptis[,-c(7,8)],nt=4,K=10, random=TRUE,modele="pls-glm-polr",keepcoeffs=TRUE,verbose=FALSE), MClassed=TRUE,verbose=FALSE) }
This function extracts and computes information criteria and fits statistics for k-fold cross validated partial least squares models for both formula or classic specifications of the model.
kfolds2CVinfos_lm(pls_kfolds, MClassed = FALSE, verbose = TRUE)kfolds2CVinfos_lm(pls_kfolds, MClassed = FALSE, verbose = TRUE)
pls_kfolds |
an object computed using |
MClassed |
should number of miss classed be computed |
verbose |
should infos be displayed ? |
The Mclassed option should only set to TRUE if the response is
binary.
list |
table of fit statistics for first group partition |
list() |
... |
list |
table of fit statistics for last group partition |
Use summary and cv.plsR instead.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Pressind,
kfolds2Press, kfolds2Mclassedind and
kfolds2Mclassed to extract and transforms results from k-fold
cross-validation.
data(Cornell) summary(cv.plsR(Y~.,data=Cornell,nt=10,K=6,verbose=FALSE)) data(pine) summary(cv.plsR(x11~.,data=pine,nt=10,NK=3,verbose=FALSE),verbose=FALSE) data(pineNAX21) summary(cv.plsR(x11~.,data=pineNAX21,nt=10,NK=3, verbose=FALSE),verbose=FALSE) data(aze_compl) summary(cv.plsR(y~.,data=aze_compl,nt=10,K=8,NK=3, verbose=FALSE),MClassed=TRUE,verbose=FALSE)data(Cornell) summary(cv.plsR(Y~.,data=Cornell,nt=10,K=6,verbose=FALSE)) data(pine) summary(cv.plsR(x11~.,data=pine,nt=10,NK=3,verbose=FALSE),verbose=FALSE) data(pineNAX21) summary(cv.plsR(x11~.,data=pineNAX21,nt=10,NK=3, verbose=FALSE),verbose=FALSE) data(aze_compl) summary(cv.plsR(y~.,data=aze_compl,nt=10,K=8,NK=3, verbose=FALSE),MClassed=TRUE,verbose=FALSE)
This function indicates the total number of missclassified individuals for k-fold cross validated partial least squares regression models.
kfolds2Mclassed(pls_kfolds)kfolds2Mclassed(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression model used on binary data |
list |
Total number of missclassified individuals vs number of components for the first group partition |
list() |
... |
list |
Total number of missclassified individuals vs number of components for the last group partition |
Use cv.plsR to create k-fold cross validated partial
least squares regression models.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Press,
kfolds2Pressind and kfolds2Mclassedind to
extract and transforms results from k-fold cross validation.
data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Mclassed(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=1,verbose=FALSE)) kfolds2Mclassed(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=2,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Mclassed(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=1,verbose=FALSE)) kfolds2Mclassed(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=2,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))
This function indicates the number of missclassified individuals per group for k-fold cross validated partial least squares regression models.
kfolds2Mclassedind(pls_kfolds)kfolds2Mclassedind(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression model used on binary data |
list |
Number of missclassified individuals per group vs number of components for the first group partition |
list() |
... |
list |
Number of missclassified individuals per group vs number of components for the last group partition |
Use cv.plsR or cv.plsRglm to create k-fold
cross validated partial least squares regression models or generalized
linear ones.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Press,
kfolds2Pressind and kfolds2Mclassed to extract
and transforms results from k-fold cross-validation.
data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Mclassedind(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=1,verbose=FALSE)) kfolds2Mclassedind(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=2,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y kfolds2Mclassedind(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=1,verbose=FALSE)) kfolds2Mclassedind(cv.plsR(object=yaze_compl,dataX=Xaze_compl,nt=10,K=8,NK=2,verbose=FALSE)) rm(list=c("Xaze_compl","yaze_compl"))
This function computes PRESS for k-fold cross validated partial least squares regression models.
kfolds2Press(pls_kfolds)kfolds2Press(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression model |
list |
Press vs number of components for the first group partition |
list() |
... |
list |
Press vs number of components for the last group partition |
Use cv.plsR to create k-fold cross validated partial
least squares regression models.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Pressind,
kfolds2Mclassedind and kfolds2Mclassed to
extract and transforms results from k-fold cross validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] kfolds2Press(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=12,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=6,NK=1,verbose=FALSE)) rm(list=c("XCornell","yCornell")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] kfolds2Press(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=2,verbose=FALSE)) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA kfolds2Press(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=2,verbose=FALSE)) rm(list=c("Xpine","XpineNAX21","ypine"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] kfolds2Press(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=12,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=6,NK=1,verbose=FALSE)) rm(list=c("XCornell","yCornell")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] kfolds2Press(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=2,verbose=FALSE)) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA kfolds2Press(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=1,verbose=FALSE)) kfolds2Press(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=2,verbose=FALSE)) rm(list=c("Xpine","XpineNAX21","ypine"))
This function computes individual PRESS for k-fold cross validated partial least squares regression models.
kfolds2Pressind(pls_kfolds)kfolds2Pressind(pls_kfolds)
pls_kfolds |
a k-fold cross validated partial least squares regression model |
list |
Individual Press vs number of components for the first group partition |
list() |
... |
list |
Individual Press vs number of components for the last group partition |
Use cv.plsR to create k-fold cross validated partial
least squares regression models.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
kfolds2coeff, kfolds2Press,
kfolds2Mclassedind and kfolds2Mclassed to
extract and transforms results from k-fold cross validation.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] kfolds2Pressind(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=12,NK=1)) kfolds2Pressind(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=6,NK=1)) rm(list=c("XCornell","yCornell")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] kfolds2Pressind(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=1,verbose=FALSE)) kfolds2Pressind(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=2,verbose=FALSE)) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA kfolds2Pressind(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=1,verbose=FALSE)) kfolds2Pressind(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=2,verbose=FALSE)) rm(list=c("Xpine","XpineNAX21","ypine"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] kfolds2Pressind(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=12,NK=1)) kfolds2Pressind(cv.plsR(object=yCornell,dataX=data.frame(scale(as.matrix(XCornell))[,]), nt=6,K=6,NK=1)) rm(list=c("XCornell","yCornell")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] kfolds2Pressind(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=1,verbose=FALSE)) kfolds2Pressind(cv.plsR(object=ypine,dataX=Xpine,nt=10,NK=2,verbose=FALSE)) XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA kfolds2Pressind(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=1,verbose=FALSE)) kfolds2Pressind(cv.plsR(object=ypine,dataX=XpineNAX21,nt=10,NK=2,verbose=FALSE)) rm(list=c("Xpine","XpineNAX21","ypine"))
This function provides loglikelihood computation for an univariate plsR model.
loglikpls(residpls, weights = rep.int(1, length(residpls)))loglikpls(residpls, weights = rep.int(1, length(residpls)))
residpls |
Residuals of a fitted univariate plsR model |
weights |
Weights of observations |
Loglikelihood functions for plsR models with univariate response.
real |
Loglikelihood value |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Baibing Li, Julian Morris, Elaine B. Martin, Model selection for partial least squares regression, Chemometrics and Intelligent Laboratory Systems 64 (2002) 79-89, doi:10.1016/S0169-7439(02)00051-5.
AICpls for AIC computation and logLik
for loglikelihood computations for linear models
data(pine) ypine <- pine[,11] Xpine <- pine[,1:10] (Pinscaled <- as.data.frame(cbind(scale(ypine),scale(as.matrix(Xpine))))) colnames(Pinscaled)[1] <- "yy" lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) modpls <- plsR(ypine,Xpine,10) modpls$Std.Coeffs lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) AIC(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)) print(logLik(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) sum(dnorm(modpls$RepY, modpls$Std.ValsPredictY, sqrt(mean(modpls$residY^2)), log=TRUE)) sum(dnorm(Pinscaled$yy,fitted(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)), sqrt(mean(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))^2)), log=TRUE)) loglikpls(modpls$residY) loglikpls(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,modpls$residY)data(pine) ypine <- pine[,11] Xpine <- pine[,1:10] (Pinscaled <- as.data.frame(cbind(scale(ypine),scale(as.matrix(Xpine))))) colnames(Pinscaled)[1] <- "yy" lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) modpls <- plsR(ypine,Xpine,10) modpls$Std.Coeffs lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled) AIC(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)) print(logLik(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) sum(dnorm(modpls$RepY, modpls$Std.ValsPredictY, sqrt(mean(modpls$residY^2)), log=TRUE)) sum(dnorm(Pinscaled$yy,fitted(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled)), sqrt(mean(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))^2)), log=TRUE)) loglikpls(modpls$residY) loglikpls(residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,residuals(lm(yy~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10,data=Pinscaled))) AICpls(10,modpls$residY)
A function passed to boot to perform bootstrap.
permcoefs.plsR(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)permcoefs.plsR(dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose)
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsR |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=permcoefs.plsR is the default for (Y,X) permutation resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYX <- bootpls(modpls, sim="permutation", R=250, statistic=permcoefs.plsR, verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=permcoefs.plsR is the default for (Y,X) permutation resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYX <- bootpls(modpls, sim="permutation", R=250, statistic=permcoefs.plsR, verbose=FALSE)
A function passed to boot to perform bootstrap.
permcoefs.plsR.raw( dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose )permcoefs.plsR.raw( dataset, ind, nt, modele, maxcoefvalues, ifbootfail, verbose )
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsR |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- permcoefs.plsR.raw(Cornell[,-8],1:nrow(Cornell),nt=3, maxcoefvalues=1e5,ifbootfail=rep(0,3),verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) modpls <- permcoefs.plsR.raw(Cornell[,-8],1:nrow(Cornell),nt=3, maxcoefvalues=1e5,ifbootfail=rep(0,3),verbose=FALSE)
A function passed to boot to perform bootstrap.
permcoefs.plsRglm( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )permcoefs.plsRglm( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
fit_backend |
backend used for repeated non-ordinal score-space GLM
fits. Use |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm.
data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYX <- bootplsglm(modplsglm, R=250, typeboot="plsmodel", sim="permutation", statistic=permcoefs.plsRglm, verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYX <- bootplsglm(modplsglm, R=250, typeboot="plsmodel", sim="permutation", statistic=permcoefs.plsRglm, verbose=FALSE)
A function passed to boot to perform bootstrap.
permcoefs.plsRglm.raw( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )permcoefs.plsRglm.raw( dataset, ind, nt, modele, family = NULL, fit_backend = "stats", maxcoefvalues, ifbootfail, verbose )
dataset |
dataset to resample |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
fit_backend |
backend used for repeated non-ordinal score-space GLM
fits. Use |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
verbose |
should info messages be displayed ? |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm.
data(Cornell) # (Y,X) bootstrap of a PLSGLR model set.seed(250) modplsglm <- permcoefs.plsRglm.raw(Cornell[,-8],1:nrow(Cornell),nt=3, modele="pls-glm-family",family=gaussian,maxcoefvalues=1e5, ifbootfail=rep(0,3),verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model set.seed(250) modplsglm <- permcoefs.plsRglm.raw(Cornell[,-8],1:nrow(Cornell),nt=3, modele="pls-glm-family",family=gaussian,maxcoefvalues=1e5, ifbootfail=rep(0,3),verbose=FALSE)
A function passed to boot to perform bootstrap.
permcoefs.plsRglmnp( dataRepYtt, ind, nt, modele, family = NULL, maxcoefvalues, wwetoile, ifbootfail )permcoefs.plsRglmnp( dataRepYtt, ind, nt, modele, family = NULL, maxcoefvalues, wwetoile, ifbootfail )
dataRepYtt |
components' coordinates to bootstrap |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
family |
glm family to use, see plsRglm |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
wwetoile |
values of the Wstar matrix in the original fit |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
~~some notes~~
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootplsglm
data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYT <- bootplsglm(modplsglm, R=250, statistic=permcoefs.plsRglmnp, verbose=FALSE)data(Cornell) # (Y,X) bootstrap of a PLSGLR model # statistic=coefs.plsRglm is the default for (Y,X) bootstrap of a PLSGLR models. set.seed(250) modplsglm <- plsRglm(Y~.,data=Cornell,1,modele="pls-glm-family",family=gaussian) Cornell.bootYT <- bootplsglm(modplsglm, R=250, statistic=permcoefs.plsRglmnp, verbose=FALSE)
A function passed to boot to perform bootstrap.
permcoefs.plsRnp( dataRepYtt, ind, nt, modele, maxcoefvalues, wwetoile, ifbootfail )permcoefs.plsRnp( dataRepYtt, ind, nt, modele, maxcoefvalues, wwetoile, ifbootfail )
dataRepYtt |
components' coordinates to bootstrap |
ind |
indices for resampling |
nt |
number of components to use |
modele |
type of modele to use, see plsRglm |
maxcoefvalues |
maximum values allowed for the estimates of the coefficients to discard those coming from singular bootstrap samples |
wwetoile |
values of the Wstar matrix in the original fit |
ifbootfail |
value to return if the estimation fails on a bootstrap sample |
estimates on a bootstrap sample or ifbootfail value if the
bootstrap computation fails.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
See also bootpls
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYT <- bootpls(modpls, R=250, typeboot="fmodel_np", sim="permutation", statistic=permcoefs.plsRnp, verbose=FALSE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] # Lazraq-Cleroux PLS (Y,X) bootstrap # statistic=coefs.plsR is the default for (Y,X) resampling of PLSR models. set.seed(250) modpls <- plsR(yCornell,XCornell,1) Cornell.bootYT <- bootpls(modpls, R=250, typeboot="fmodel_np", sim="permutation", statistic=permcoefs.plsRnp, verbose=FALSE)
The caterpillar dataset was extracted from a 1973 study on pine
processionary caterpillars. It assesses the influence of some forest
settlement characteristics on the development of caterpillar colonies. The
response variable is the logarithmic transform of the average number of
nests of caterpillars per tree in an area of 500 square meters (x11).
There are k=10 potentially explanatory variables defined on n=33 areas.
A data frame with 33 observations on the following 11 variables.
altitude (in meters)
slope (en degrees)
number of pines in the area
height (in meters) of the tree sampled at the center of the area
diameter (in meters) of the tree sampled at the center of the area
index of the settlement density
orientation of the area (from 1 if southbound to 2 otherwise)
height (in meters) of the dominant tree
number of vegetation strata
mix settlement index (from 1 if not mixed to 2 if mixed)
logarithmic transform of the average number of nests of caterpillars per tree
These caterpillars got their names from their habit of moving over the
ground in incredibly long head-to-tail processions when leaving their nest
to create a new colony.
The pine_sup dataset can be used as a test set to assess model
prediction error of a model trained on the pine dataset.
Tomassone R., Audrain S., Lesquoy-de Turckeim E., Millier C. (1992), “La régression, nouveaux regards sur une ancienne méthode statistique”, INRA, Actualités Scientifiques et Agronomiques, Masson, Paris.
J.-M. Marin, C. Robert. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New-York, pages 48-49.
data(pine) str(pine)data(pine) str(pine)
This is the complete caterpillar dataset from a 1973 study on pine_full
processionary caterpillars. It assesses the influence of some forest
settlement characteristics on the development of caterpillar colonies. The
response variable is the logarithmic transform of the average number of
nests of caterpillars per tree in an area of 500 square meters (x11).
There are k=10 potentially explanatory variables defined on n=55 areas.
A data frame with 55 observations on the following 11 variables.
altitude (in meters)
slope (en degrees)
number of pine_fulls in the area
height (in meters) of the tree sampled at the center of the area
diameter (in meters) of the tree sampled at the center of the area
index of the settlement density
orientation of the area (from 1 if southbound to 2 otherwise)
height (in meters) of the dominant tree
number of vegetation strata
mix settlement index (from 1 if not mixed to 2 if mixed)
logarithmic transform of the average number of nests of caterpillars per tree
These caterpillars got their names from their habit of moving over the ground in incredibly long head-to-tail processions when leaving their nest to create a new colony.
Tomassone R., Audrain S., Lesquoy-de Turckeim E., Millier C. (1992), “La régression, nouveaux regards sur une ancienne méthode statistique”, INRA, Actualités Scientifiques et Agronomiques, Masson, Paris.
J.-M. Marin, C. Robert. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New-York, pages 48-49.
data(pine_full) str(pine_full)data(pine_full) str(pine_full)
This is a supplementary dataset (used as a test set for the pine
dataset) that was extracted from a 1973 study on pine_sup processionary
caterpillars. It assesses the influence of some forest settlement
characteristics on the development of caterpillar colonies. The response
variable is the logarithmic transform of the average number of nests of
caterpillars per tree in an area of 500 square meters (x11). There
are k=10 potentially explanatory variables defined on n=22 areas.
A data frame with 22 observations on the following 11 variables.
altitude (in meters)
slope (en degrees)
number of pine_sups in the area
height (in meters) of the tree sampled at the center of the area
diameter (in meters) of the tree sampled at the center of the area
index of the settlement density
orientation of the area (from 1 if southbound to 2 otherwise)
height (in meters) of the dominant tree
number of vegetation strata
mix settlement index (from 1 if not mixed to 2 if mixed)
logarithmic transform of the average number of nests of caterpillars per tree
These caterpillars got their names from their habit of moving over the
ground in incredibly long head-to-tail processions when leaving their nest
to create a new colony.
The pine_sup dataset can be used as a test set to assess model
prediction error of a model trained on the pine dataset.
Tomassone R., Audrain S., Lesquoy-de Turckeim E., Millier C. (1992), “La régression, nouveaux regards sur une ancienne méthode statistique”, INRA, Actualités Scientifiques et Agronomiques, Masson, Paris.
J.-M. Marin, C. Robert. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New-York, pages 48-49.
data(pine_sup) str(pine_sup)data(pine_sup) str(pine_sup)
The caterpillar dataset was extracted from a 1973 study on pine
processionary caterpillars. It assesses the influence of some forest
settlement characteristics on the development of caterpillar colonies. There
are k=10 potentially explanatory variables defined on n=33 areas.
The
value of x2 for the first observation was removed from the matrix of
predictors on purpose.
A data frame with 33 observations on the following 11 variables and one missing value.
altitude (in meters)
slope (en degrees)
number of pines in the area
height (in meters) of the tree sampled at the center of the area
diameter (in meters) of the tree sampled at the center of the area
index of the settlement density
orientation of the area (from 1 if southbound to 2 otherwise)
height (in meters) of the dominant tree
number of vegetation strata
mix settlement index (from 1 if not mixed to 2 if mixed)
logarithmic transform of the average number of nests of caterpillars per tree
These caterpillars got their names from their habit of moving over the
ground in incredibly long head-to-tail processions when leaving their nest
to create a new colony.
The pineNAX21 is a dataset with a missing
value for testing purpose.
Tomassone R., Audrain S., Lesquoy-de Turckeim E., Millier C. (1992). “La régression, nouveaux regards sur une ancienne méthode statistique”, INRA, Actualités Scientifiques et Agronomiques, Masson, Paris.
data(pineNAX21) str(pineNAX21)data(pineNAX21) str(pineNAX21)
This function provides a table method for the class
"summary.cv.plsRglmmodel"
## S3 method for class 'table.summary.cv.plsRglmmodel' plot(x, type = c("CVMC", "CVQ2Chi2", "CVPreChi2"), ...)## S3 method for class 'table.summary.cv.plsRglmmodel' plot(x, type = c("CVMC", "CVQ2Chi2", "CVPreChi2"), ...)
x |
an object of the class |
type |
the type of cross validation criterion to plot. |
... |
further arguments to be passed to or from methods. |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE) plot(cvtable(summary(bbb,verbose=FALSE)),type="CVQ2Chi2") rm(list=c("bbb")) data(Cornell) plot(cvtable(summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=100, modele="pls-glm-family",family=gaussian(), verbose=FALSE), verbose=FALSE)),type="CVQ2Chi2")data(Cornell) bbb <- cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE) plot(cvtable(summary(bbb,verbose=FALSE)),type="CVQ2Chi2") rm(list=c("bbb")) data(Cornell) plot(cvtable(summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=100, modele="pls-glm-family",family=gaussian(), verbose=FALSE), verbose=FALSE)),type="CVQ2Chi2")
This function provides a table method for the class
"summary.cv.plsRmodel"
## S3 method for class 'table.summary.cv.plsRmodel' plot(x, type = c("CVMC", "CVQ2", "CVPress"), ...)## S3 method for class 'table.summary.cv.plsRmodel' plot(x, type = c("CVMC", "CVQ2", "CVPress"), ...)
x |
an object of the class |
type |
the type of cross validation criterion to plot. |
... |
further arguments to be passed to or from methods. |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) bbb <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=5, verbose=FALSE) plot(cvtable(summary(bbb)),type="CVQ2") rm(list=c("bbb")) data(Cornell) plot(cvtable(summary(cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=100, verbose=FALSE))),type="CVQ2")data(Cornell) bbb <- cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=5, verbose=FALSE) plot(cvtable(summary(bbb)),type="CVQ2") rm(list=c("bbb")) data(Cornell) plot(cvtable(summary(cv.plsR(Y~.,data=Cornell,nt=6,K=6,NK=100, verbose=FALSE))),type="CVQ2")
This function plots the confidence intervals derived using the function
confints.bootpls from from a bootpls based object.
plots.confints.bootpls( ic_bootobject, indices = NULL, legendpos = "topleft", prednames = TRUE, articlestyle = TRUE, xaxisticks = TRUE, ltyIC = c(2, 4, 5, 1), colIC = c("darkgreen", "blue", "red", "black"), typeIC, las = par("las"), mar, mgp, ... )plots.confints.bootpls( ic_bootobject, indices = NULL, legendpos = "topleft", prednames = TRUE, articlestyle = TRUE, xaxisticks = TRUE, ltyIC = c(2, 4, 5, 1), colIC = c("darkgreen", "blue", "red", "black"), typeIC, las = par("las"), mar, mgp, ... )
ic_bootobject |
an object created with the |
indices |
vector of indices of the variables to plot. Defaults to
|
legendpos |
position of the legend as in
|
prednames |
do the original names of the predictors shall be plotted ?
Defaults to |
articlestyle |
do the extra blank zones of the margin shall be removed
from the plot ? Defaults to |
xaxisticks |
do ticks for the x axis shall be plotted ? Defaults to
|
ltyIC |
lty as in |
colIC |
col as in |
typeIC |
type of CI to plot. Defaults to |
las |
numeric in 0,1,2,3; the style of axis labels. 0: always parallel to the axis [default], 1: always horizontal, 2: always perpendicular to the axis, 3: always vertical. |
mar |
A numerical vector of the form |
mgp |
The margin line (in mex units) for the axis title, axis labels
and axis line. Note that |
... |
further options to pass to the
|
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(Cornell) modpls <- plsR(Y~.,data=Cornell,3) # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) temp.ci <- confints.bootpls(Cornell.bootYX,2:8) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:3,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomright") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") temp.ci <- confints.bootpls(Cornell.bootYX,typeBCa=FALSE) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,2:8) plots.confints.bootpls(temp.ci,prednames=FALSE) # Bastien CSDA 2005 (Y,T) bootstrap Cornell.boot <- bootpls(modpls, typeboot="fmodel_np", R=250, verbose=FALSE) temp.ci <- confints.bootpls(Cornell.boot,2:8) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:3,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomright") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") temp.ci <- confints.bootpls(Cornell.boot,typeBCa=FALSE) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,2:8) plots.confints.bootpls(temp.ci,prednames=FALSE) data(aze_compl) modplsglm <- plsRglm(y~.,data=aze_compl,3,modele="pls-glm-logistic") # Lazraq-Cleroux PLS (Y,X) bootstrap # should be run with R=1000 but takes much longer time aze_compl.bootYX3 <- bootplsglm(modplsglm, typeboot="plsmodel", R=250, verbose=FALSE) temp.ci <- confints.bootpls(aze_compl.bootYX3) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomleft") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") plots.confints.bootpls(temp.ci,indices=1:34,prednames=FALSE) plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE,ltyIC=1,colIC=c(1,2)) temp.ci <- confints.bootpls(aze_compl.bootYX3,1:34,typeBCa=FALSE) plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) # Bastien CSDA 2005 (Y,T) Bootstrap # much faster aze_compl.bootYT3 <- bootplsglm(modplsglm, R=1000, verbose=FALSE) temp.ci <- confints.bootpls(aze_compl.bootYT3) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,typeIC="Normal") plots.confints.bootpls(temp.ci,typeIC=c("Normal","Basic")) plots.confints.bootpls(temp.ci,typeIC="BCa",legendpos="bottomleft") plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomleft") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") plots.confints.bootpls(temp.ci,prednames=FALSE,ltyIC=c(2,1),colIC=c(1,2)) temp.ci <- confints.bootpls(aze_compl.bootYT3,1:33,typeBCa=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE)data(Cornell) modpls <- plsR(Y~.,data=Cornell,3) # Lazraq-Cleroux PLS (Y,X) bootstrap set.seed(250) Cornell.bootYX <- bootpls(modpls, R=250, verbose=FALSE) temp.ci <- confints.bootpls(Cornell.bootYX,2:8) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:3,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomright") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") temp.ci <- confints.bootpls(Cornell.bootYX,typeBCa=FALSE) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,2:8) plots.confints.bootpls(temp.ci,prednames=FALSE) # Bastien CSDA 2005 (Y,T) bootstrap Cornell.boot <- bootpls(modpls, typeboot="fmodel_np", R=250, verbose=FALSE) temp.ci <- confints.bootpls(Cornell.boot,2:8) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:3,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomright") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") temp.ci <- confints.bootpls(Cornell.boot,typeBCa=FALSE) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,2:8) plots.confints.bootpls(temp.ci,prednames=FALSE) data(aze_compl) modplsglm <- plsRglm(y~.,data=aze_compl,3,modele="pls-glm-logistic") # Lazraq-Cleroux PLS (Y,X) bootstrap # should be run with R=1000 but takes much longer time aze_compl.bootYX3 <- bootplsglm(modplsglm, typeboot="plsmodel", R=250, verbose=FALSE) temp.ci <- confints.bootpls(aze_compl.bootYX3) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomleft") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") plots.confints.bootpls(temp.ci,indices=1:34,prednames=FALSE) plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE,ltyIC=1,colIC=c(1,2)) temp.ci <- confints.bootpls(aze_compl.bootYX3,1:34,typeBCa=FALSE) plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) # Bastien CSDA 2005 (Y,T) Bootstrap # much faster aze_compl.bootYT3 <- bootplsglm(modplsglm, R=1000, verbose=FALSE) temp.ci <- confints.bootpls(aze_compl.bootYT3) plots.confints.bootpls(temp.ci) plots.confints.bootpls(temp.ci,typeIC="Normal") plots.confints.bootpls(temp.ci,typeIC=c("Normal","Basic")) plots.confints.bootpls(temp.ci,typeIC="BCa",legendpos="bottomleft") plots.confints.bootpls(temp.ci,prednames=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE,articlestyle=FALSE, main="Bootstrap confidence intervals for the bj") plots.confints.bootpls(temp.ci,indices=1:33,prednames=FALSE) plots.confints.bootpls(temp.ci,c(2,4,6),"bottomleft") plots.confints.bootpls(temp.ci,c(2,4,6),articlestyle=FALSE, main="Bootstrap confidence intervals for some of the bj") plots.confints.bootpls(temp.ci,prednames=FALSE,ltyIC=c(2,1),colIC=c(1,2)) temp.ci <- confints.bootpls(aze_compl.bootYT3,1:33,typeBCa=FALSE) plots.confints.bootpls(temp.ci,prednames=FALSE)
Light version of PLS_glm for cross validation purposes either on
complete or incomplete datasets.
PLS_glm_wvc( dataY, dataX, nt = 2, dataPredictY = dataX, modele = "pls", family = NULL, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepstd.coeffs = FALSE, tol_Xi = 10^(-12), weights, method = "logistic", fit_backend = "stats", verbose = TRUE )PLS_glm_wvc( dataY, dataX, nt = 2, dataPredictY = dataX, modele = "pls", family = NULL, scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepstd.coeffs = FALSE, tol_Xi = 10^(-12), weights, method = "logistic", fit_backend = "stats", verbose = TRUE )
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
nt |
number of components to be extracted |
dataPredictY |
predictor(s) (testing) dataset |
modele |
name of the PLS glm model to be fitted ( |
family |
a description of the error distribution and link function to
be used in the model. This can be a character string naming a family
function, a family function or the result of a call to a family function.
(See |
scaleX |
scale the predictor(s) : must be set to TRUE for
|
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
keepcoeffs |
whether the coefficients of the linear fit on link scale of unstandardized eXplanatory variables should be returned or not. |
keepstd.coeffs |
whether the coefficients of the linear fit on link scale of standardized eXplanatory variables should be returned or not. |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the
fitting process. Should be |
method |
logistic, probit, complementary log-log or cauchit (corresponding to a Cauchy latent variable). |
fit_backend |
backend used for repeated non-ordinal score-space GLM
fits. Use |
verbose |
should info messages be displayed ? |
This function is called by PLS_glm_kfoldcv_formula in order to
perform cross-validation either on complete or incomplete datasets.
There are seven different predefined models with predefined link functions available :
ordinary pls models
glm gaussian with inverse link pls models
glm gaussian with identity link pls models
glm binomial with square inverse link pls models
glm binomial with logit link pls models
glm poisson with log link pls models
glm polr with logit link pls models
Using the "family=" option and setting
"modele=pls-glm-family" allows changing the family and link function
the same way as for the glm function. As a consequence
user-specified families can also be used.
accepts
the links (as names) identity, log and
inverse.
accepts the links (as names)
identity, log and inverse.
accepts the
links (as names) identity, log and inverse.
accepts the links logit, probit, cauchit,
(corresponding to logistic, normal and Cauchy CDFs respectively) log
and cloglog (complementary log-log).
accepts
the links logit, probit, cauchit, (corresponding to
logistic, normal and Cauchy CDFs respectively) log and cloglog
(complementary log-log).
accepts the links logit,
probit, cauchit, (corresponding to logistic, normal and Cauchy
CDFs respectively) log and cloglog (complementary log-log).
accepts the links inverse, identity and
log.
accepts the links inverse,
identity and log.
accepts the links
inverse, identity and log.
accepts the
links log, identity, and
sqrt.
accepts the links log,
identity, and sqrt.
accepts the links
log, identity, and sqrt.
accepts the links
1/mu^2, inverse, identity and
log.
accepts the links 1/mu^2,
inverse, identity and log.
accepts the
links 1/mu^2, inverse, identity and log.
accepts the links logit, probit, cloglog,
identity, inverse, log, 1/mu^2 and
sqrt.
accepts the links logit,
probit, cloglog, identity, inverse, log,
1/mu^2 and sqrt.
accepts the links
logit, probit, cloglog, identity,
inverse, log, 1/mu^2 and sqrt.
can be used to create a power link function.
can be used to create a power link function.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
valsPredict |
|
list("coeffs") |
If the coefficients of the
eXplanatory variables were requested: |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
PLS_glm for more detailed results,
PLS_glm_kfoldcv for cross-validating models and
PLS_lm_wvc for the same function dedicated to plsR models
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,]) PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-family", family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE) rm("XCornell","yCornell") ## With an incomplete dataset (X[1,2] is NA) data(pine) ypine <- pine[,11] data(XpineNAX21) PLS_glm_wvc(dataY=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian") rm("XpineNAX21","ypine") data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] PLS_glm_wvc(ypine,Xpine,10,modele="pls", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-Gamma", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=Gamma(), verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-gaussian", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=gaussian(log), verbose=FALSE) PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-poisson", verbose=FALSE) PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-family",family=poisson(log), verbose=FALSE) rm(list=c("pine","ypine","Xpine")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-inverse.gaussian", verbose=FALSE) PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-family", family=inverse.gaussian(), verbose=FALSE) rm(list=c("XCornell","yCornell")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) rm("XCornell","yCornell") data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y PLS_glm(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic",typeVC="none", verbose=FALSE)$InfCrit PLS_glm_wvc(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic", keepcoeffs=TRUE, verbose=FALSE) rm("Xaze_compl","yaze_compl")data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,]) PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-family", family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-family", family=gaussian(),dataPredictY=XCornell[1,], verbose=FALSE) rm("XCornell","yCornell") ## With an incomplete dataset (X[1,2] is NA) data(pine) ypine <- pine[,11] data(XpineNAX21) PLS_glm_wvc(dataY=ypine,dataX=XpineNAX21,nt=10,modele="pls-glm-gaussian") rm("XpineNAX21","ypine") data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] PLS_glm_wvc(ypine,Xpine,10,modele="pls", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-Gamma", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=Gamma(), verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-gaussian", verbose=FALSE) PLS_glm_wvc(ypine,Xpine,10,modele="pls-glm-family",family=gaussian(log), verbose=FALSE) PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-poisson", verbose=FALSE) PLS_glm_wvc(round(ypine),Xpine,10,modele="pls-glm-family",family=poisson(log), verbose=FALSE) rm(list=c("pine","ypine","Xpine")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-inverse.gaussian", verbose=FALSE) PLS_glm_wvc(yCornell,XCornell,10,modele="pls-glm-family", family=inverse.gaussian(), verbose=FALSE) rm(list=c("XCornell","yCornell")) data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_glm_wvc(dataY=yCornell,dataX=XCornell,nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) PLS_glm_wvc(dataY=yCornell[-1],dataX=XCornell[-1,],nt=3,modele="pls-glm-gaussian", dataPredictY=XCornell[1,], verbose=FALSE) rm("XCornell","yCornell") data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y PLS_glm(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic",typeVC="none", verbose=FALSE)$InfCrit PLS_glm_wvc(yaze_compl,Xaze_compl,10,modele="pls-glm-logistic", keepcoeffs=TRUE, verbose=FALSE) rm("Xaze_compl","yaze_compl")
Light version of PLS_lm for cross validation purposes either on
complete or incomplete datasets.
PLS_lm_wvc( dataY, dataX, nt = 2, dataPredictY = dataX, modele = "pls", scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepstd.coeffs = FALSE, tol_Xi = 10^(-12), weights, verbose = TRUE )PLS_lm_wvc( dataY, dataX, nt = 2, dataPredictY = dataX, modele = "pls", scaleX = TRUE, scaleY = NULL, keepcoeffs = FALSE, keepstd.coeffs = FALSE, tol_Xi = 10^(-12), weights, verbose = TRUE )
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
nt |
number of components to be extracted |
dataPredictY |
predictor(s) (testing) dataset |
modele |
name of the PLS model to be fitted, only ( |
scaleX |
scale the predictor(s) : must be set to TRUE for
|
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
keepcoeffs |
whether the coefficients of unstandardized eXplanatory variables should be returned or not. |
keepstd.coeffs |
whether the coefficients of standardized eXplanatory variables should be returned or not. |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the
fitting process. Should be |
verbose |
should info messages be displayed ? |
This function is called by PLS_lm_kfoldcv in order to perform
cross-validation either on complete or incomplete datasets.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
valsPredict |
|
list("coeffs") |
If the coefficients of the
eXplanatory variables were requested: |
Use PLS_lm_kfoldcv for a wrapper in view of
cross-validation.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
PLS_lm for more detailed results,
PLS_lm_kfoldcv for cross-validating models and
PLS_glm_wvc for the same function dedicated to plsRglm models
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_lm_wvc(dataY=yCornell,dataX=XCornell,nt=3,dataPredictY=XCornell[1,]) PLS_lm_wvc(dataY=yCornell[-c(1,2)],dataX=XCornell[-c(1,2),],nt=3,dataPredictY=XCornell[c(1,2),], verbose=FALSE) PLS_lm_wvc(dataY=yCornell[-c(1,2)],dataX=XCornell[-c(1,2),],nt=3,dataPredictY=XCornell[c(1,2),], keepcoeffs=TRUE, verbose=FALSE) rm("XCornell","yCornell") ## With an incomplete dataset (X[1,2] is NA) data(pine) ypine <- pine[,11] data(XpineNAX21) PLS_lm_wvc(dataY=ypine[-1],dataX=XpineNAX21[-1,],nt=3, verbose=FALSE) PLS_lm_wvc(dataY=ypine[-1],dataX=XpineNAX21[-1,],nt=3,dataPredictY=XpineNAX21[1,], verbose=FALSE) PLS_lm_wvc(dataY=ypine[-2],dataX=XpineNAX21[-2,],nt=3,dataPredictY=XpineNAX21[2,], verbose=FALSE) PLS_lm_wvc(dataY=ypine,dataX=XpineNAX21,nt=3, verbose=FALSE) rm("ypine")data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] PLS_lm_wvc(dataY=yCornell,dataX=XCornell,nt=3,dataPredictY=XCornell[1,]) PLS_lm_wvc(dataY=yCornell[-c(1,2)],dataX=XCornell[-c(1,2),],nt=3,dataPredictY=XCornell[c(1,2),], verbose=FALSE) PLS_lm_wvc(dataY=yCornell[-c(1,2)],dataX=XCornell[-c(1,2),],nt=3,dataPredictY=XCornell[c(1,2),], keepcoeffs=TRUE, verbose=FALSE) rm("XCornell","yCornell") ## With an incomplete dataset (X[1,2] is NA) data(pine) ypine <- pine[,11] data(XpineNAX21) PLS_lm_wvc(dataY=ypine[-1],dataX=XpineNAX21[-1,],nt=3, verbose=FALSE) PLS_lm_wvc(dataY=ypine[-1],dataX=XpineNAX21[-1,],nt=3,dataPredictY=XpineNAX21[1,], verbose=FALSE) PLS_lm_wvc(dataY=ypine[-2],dataX=XpineNAX21[-2,],nt=3,dataPredictY=XpineNAX21[2,], verbose=FALSE) PLS_lm_wvc(dataY=ypine,dataX=XpineNAX21,nt=3, verbose=FALSE) rm("ypine")
This function implements Partial least squares Regression models with leave one out cross validation for complete or incomplete datasets.
plsR(object, ...) ## Default S3 method: plsRmodel(object, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = TRUE, naive = FALSE,verbose=TRUE,...) ## S3 method for class 'formula' plsRmodel(object, data, nt = 2, limQ2set = 0.0975, dataPredictY, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, subset, contrasts = NULL, sparse = FALSE, sparseStop = TRUE, naive = FALSE, verbose=TRUE,...) PLS_lm(dataY, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights,sparse=FALSE,sparseStop=FALSE,naive=FALSE,verbose=TRUE) PLS_lm_formula(formula,data=NULL,nt=2,limQ2set=.0975,dataPredictY=dataX, modele="pls",family=NULL,typeVC="none",EstimXNA=FALSE,scaleX=TRUE, scaleY=NULL,pvals.expli=FALSE,alpha.pvals.expli=.05,MClassed=FALSE, tol_Xi=10^(-12),weights,subset,contrasts=NULL,sparse=FALSE, sparseStop=FALSE,naive=FALSE,verbose=TRUE)plsR(object, ...) ## Default S3 method: plsRmodel(object, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = TRUE, naive = FALSE,verbose=TRUE,...) ## S3 method for class 'formula' plsRmodel(object, data, nt = 2, limQ2set = 0.0975, dataPredictY, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, subset, contrasts = NULL, sparse = FALSE, sparseStop = TRUE, naive = FALSE, verbose=TRUE,...) PLS_lm(dataY, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights,sparse=FALSE,sparseStop=FALSE,naive=FALSE,verbose=TRUE) PLS_lm_formula(formula,data=NULL,nt=2,limQ2set=.0975,dataPredictY=dataX, modele="pls",family=NULL,typeVC="none",EstimXNA=FALSE,scaleX=TRUE, scaleY=NULL,pvals.expli=FALSE,alpha.pvals.expli=.05,MClassed=FALSE, tol_Xi=10^(-12),weights,subset,contrasts=NULL,sparse=FALSE, sparseStop=FALSE,naive=FALSE,verbose=TRUE)
object |
response (training) dataset or an object of class " |
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
formula |
an object of class " |
data |
an optional data frame, list or environment (or object coercible by |
nt |
number of components to be extracted |
limQ2set |
limit value for the Q2 |
dataPredictY |
predictor(s) (testing) dataset |
modele |
name of the PLS model to be fitted, only ( |
family |
for the present moment the family argument is ignored and set thanks to the value of modele. |
typeVC |
type of leave one out cross validation. Several procedures are available. If cross validation is required, one needs to selects the way of predicting the response for left out observations. For complete rows, without any missing value, there are two different ways of computing these predictions. As a consequence, for mixed datasets, with complete and incomplete rows, there are two ways of computing prediction : either predicts any row as if there were missing values in it (
|
EstimXNA |
only for |
scaleX |
scale the predictor(s) : must be set to TRUE for |
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
pvals.expli |
should individual p-values be reported to tune model selection ? |
alpha.pvals.expli |
level of significance for predictors when pvals.expli=TRUE |
MClassed |
number of missclassified cases, should only be used for binary responses |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
contrasts |
an optional list. See the |
sparse |
should the coefficients of non-significant predictors (< |
sparseStop |
should component extraction stop when no significant predictors (< |
naive |
Use the naive estimates for the Degrees of Freedom in plsR? Default is |
verbose |
should info messages be displayed ? |
... |
arguments to pass to |
There are several ways to deal with missing values that leads to different computations of leave one out cross validation criteria.
A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A specification of the form first:second indicates the the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.
The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
The default estimator for Degrees of Freedom is the Kramer and Sugiyama's one. Information criteria are computed accordingly to these estimations. Naive Degrees of Freedom and Information Criteria are also provided for comparison purposes. For more details, see N. Kraemer and M. Sugiyama. (2011). The Degrees of Freedom of Partial Least Squares Regression. Journal of the American Statistical Association, 106(494), 697-705, 2011.
nr |
Number of observations |
nc |
Number of predictors |
nt |
Number of requested components |
ww |
raw weights (before L2-normalization) |
wwnorm |
L2 normed weights (to be used with deflated matrices of predictor variables) |
wwetoile |
modified weights (to be used with original matrix of predictor variables) |
tt |
PLS components |
pp |
loadings of the predictor variables |
CoeffC |
coefficients of the PLS components |
uscores |
scores of the response variable |
YChapeau |
predicted response values for the dataX set |
residYChapeau |
residuals of the deflated response on the standardized scale |
RepY |
scaled response vector |
na.miss.Y |
is there any NA value in the response vector |
YNA |
indicatrix vector of missing values in RepY |
residY |
deflated scaled response vector |
ExpliX |
scaled matrix of predictors |
na.miss.X |
is there any NA value in the predictor matrix |
XXNA |
indicator of non-NA values in the predictor matrix |
residXX |
deflated predictor matrix |
PredictY |
response values with NA replaced with 0 |
press.ind |
individual PRESS value for each observation (scaled scale) |
press.tot |
total PRESS value for all observations (scaled scale) |
family |
glm family used to fit PLSGLR model |
ttPredictY |
PLS components for the dataset on which prediction was requested |
typeVC |
type of leave one out cross-validation used |
dataX |
predictor values |
dataY |
response values |
computed_nt |
number of components that were computed |
CoeffCFull |
matrix of the coefficients of the predictors |
CoeffConstante |
value of the intercept (scaled scale) |
Std.Coeffs |
Vector of standardized regression coefficients |
press.ind2 |
individual PRESS value for each observation (original scale) |
RSSresidY |
residual sum of squares (scaled scale) |
Coeffs |
Vector of regression coefficients (used with the original data scale) |
Yresidus |
residuals of the PLS model |
RSS |
residual sum of squares (original scale) |
residusY |
residuals of the deflated response on the standardized scale |
AIC.std |
AIC.std vs number of components (AIC computed for the standardized model |
AIC |
AIC vs number of components |
optional |
If the response is assumed to be binary:
|
ttPredictFittedMissingY |
Description of 'comp2' |
optional |
If cross validation was requested:
|
InfCrit |
table of Information Criteria |
Std.ValsPredictY |
predicted response values for supplementary dataset (standardized scale) |
ValsPredictY |
predicted response values for supplementary dataset (original scale) |
Std.XChapeau |
estimated values for missing values in the predictor matrix (standardized scale) |
XXwotNA |
predictor matrix with missing values replaced with 0 |
Use cv.plsR to cross-validate the plsRglm models and bootpls to bootstrap them.
Frederic Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frederic Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
See also plsRglm to fit PLSGLR models.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] #maximum 6 components could be extracted from this dataset #trying 10 to trigger automatic stopping criterion modpls10<-plsR(yCornell,XCornell,10) modpls10 #With iterated leave one out CV PRESS modpls6cv<-plsR(Y~.,data=Cornell,6,typeVC="standard") modpls6cv cv.modpls<-cv.plsR(Y~.,data=Cornell,6,NK=100, verbose=FALSE) res.cv.modpls<-cvtable(summary(cv.modpls)) plot(res.cv.modpls) rm(list=c("XCornell","yCornell","modpls10","modpls6cv")) #A binary response example data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y modpls.aze <- plsR(yaze_compl,Xaze_compl,10,MClassed=TRUE,typeVC="standard") modpls.aze #Direct access to not cross-validated values modpls.aze$AIC modpls.aze$AIC.std modpls.aze$MissClassed #Raw predicted values (not really probabily since not constrained in [0,1] modpls.aze$Probs #Truncated to [0;1] predicted values (true probabilities) modpls.aze$Probs.trc modpls.aze$Probs-modpls.aze$Probs.trc #Repeated cross validation of the model (NK=100 times) cv.modpls.aze<-cv.plsR(y~.,data=aze_compl,10,NK=100, verbose=FALSE) res.cv.modpls.aze<-cvtable(summary(cv.modpls.aze,MClassed=TRUE)) #High discrepancy in the number of component choice using repeated cross validation #and missclassed criterion plot(res.cv.modpls.aze) rm(list=c("Xaze_compl","yaze_compl","modpls.aze","cv.modpls.aze","res.cv.modpls.aze")) #24 predictors dimX <- 24 #2 components Astar <- 2 simul_data_UniYX(dimX,Astar) dataAstar2 <- data.frame(t(replicate(250,simul_data_UniYX(dimX,Astar)))) modpls.A2<- plsR(Y~.,data=dataAstar2,10,typeVC="standard") modpls.A2 cv.modpls.A2<-cv.plsR(Y~.,data=dataAstar2,10,NK=100, verbose=FALSE) res.cv.modpls.A2<-cvtable(summary(cv.modpls.A2,verbose=FALSE)) #Perfect choice for the Q2 criterion in PLSR plot(res.cv.modpls.A2) #Binarized data.frame simbin1 <- data.frame(dicho(dataAstar2)) modpls.B2 <- plsR(Y~.,data=simbin1,10,typeVC="standard",MClassed=TRUE, verbose=FALSE) modpls.B2 modpls.B2$Probs modpls.B2$Probs.trc modpls.B2$MissClassed plsR(simbin1$Y,dataAstar2[,-1],10,typeVC="standard",MClassed=TRUE,verbose=FALSE)$InfCrit cv.modpls.B2<-cv.plsR(Y~.,data=simbin1,2,NK=100,verbose=FALSE) res.cv.modpls.B2<-cvtable(summary(cv.modpls.B2,MClassed=TRUE)) #Only one component found by repeated CV missclassed criterion plot(res.cv.modpls.B2) rm(list=c("dimX","Astar","dataAstar2","modpls.A2","cv.modpls.A2", "res.cv.modpls.A2","simbin1","modpls.B2","cv.modpls.B2","res.cv.modpls.B2"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] #maximum 6 components could be extracted from this dataset #trying 10 to trigger automatic stopping criterion modpls10<-plsR(yCornell,XCornell,10) modpls10 #With iterated leave one out CV PRESS modpls6cv<-plsR(Y~.,data=Cornell,6,typeVC="standard") modpls6cv cv.modpls<-cv.plsR(Y~.,data=Cornell,6,NK=100, verbose=FALSE) res.cv.modpls<-cvtable(summary(cv.modpls)) plot(res.cv.modpls) rm(list=c("XCornell","yCornell","modpls10","modpls6cv")) #A binary response example data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y modpls.aze <- plsR(yaze_compl,Xaze_compl,10,MClassed=TRUE,typeVC="standard") modpls.aze #Direct access to not cross-validated values modpls.aze$AIC modpls.aze$AIC.std modpls.aze$MissClassed #Raw predicted values (not really probabily since not constrained in [0,1] modpls.aze$Probs #Truncated to [0;1] predicted values (true probabilities) modpls.aze$Probs.trc modpls.aze$Probs-modpls.aze$Probs.trc #Repeated cross validation of the model (NK=100 times) cv.modpls.aze<-cv.plsR(y~.,data=aze_compl,10,NK=100, verbose=FALSE) res.cv.modpls.aze<-cvtable(summary(cv.modpls.aze,MClassed=TRUE)) #High discrepancy in the number of component choice using repeated cross validation #and missclassed criterion plot(res.cv.modpls.aze) rm(list=c("Xaze_compl","yaze_compl","modpls.aze","cv.modpls.aze","res.cv.modpls.aze")) #24 predictors dimX <- 24 #2 components Astar <- 2 simul_data_UniYX(dimX,Astar) dataAstar2 <- data.frame(t(replicate(250,simul_data_UniYX(dimX,Astar)))) modpls.A2<- plsR(Y~.,data=dataAstar2,10,typeVC="standard") modpls.A2 cv.modpls.A2<-cv.plsR(Y~.,data=dataAstar2,10,NK=100, verbose=FALSE) res.cv.modpls.A2<-cvtable(summary(cv.modpls.A2,verbose=FALSE)) #Perfect choice for the Q2 criterion in PLSR plot(res.cv.modpls.A2) #Binarized data.frame simbin1 <- data.frame(dicho(dataAstar2)) modpls.B2 <- plsR(Y~.,data=simbin1,10,typeVC="standard",MClassed=TRUE, verbose=FALSE) modpls.B2 modpls.B2$Probs modpls.B2$Probs.trc modpls.B2$MissClassed plsR(simbin1$Y,dataAstar2[,-1],10,typeVC="standard",MClassed=TRUE,verbose=FALSE)$InfCrit cv.modpls.B2<-cv.plsR(Y~.,data=simbin1,2,NK=100,verbose=FALSE) res.cv.modpls.B2<-cvtable(summary(cv.modpls.B2,MClassed=TRUE)) #Only one component found by repeated CV missclassed criterion plot(res.cv.modpls.B2) rm(list=c("dimX","Astar","dataAstar2","modpls.A2","cv.modpls.A2", "res.cv.modpls.A2","simbin1","modpls.B2","cv.modpls.B2","res.cv.modpls.B2"))
This function computes the Degrees of Freedom using the Krylov representation of PLS and other quantities that are used to get information criteria values. For the time present, it only works with complete datasets.
## S3 method for class 'dof' plsR(modplsR, naive = FALSE)## S3 method for class 'dof' plsR(modplsR, naive = FALSE)
modplsR |
A plsR model i.e. an object returned by one of the functions
|
naive |
A boolean. |
If naive=FALSE returns values for estimated degrees of freedom and
error dispersion. If naive=TRUE returns returns values for naive
degrees of freedom and error dispersion. The original code from Nicole
Kraemer and Mikio L. Braun was unable to handle models with only one
component.
DoF |
Degrees of Freedom |
sigmahat |
Estimates of dispersion |
Yhat |
Predicted values |
yhat |
Square Euclidean norms of the predicted values |
RSS |
Residual Sums of Squares |
Nicole Kraemer, Mikio L. Braun with improvements from
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
N. Kraemer, M. Sugiyama. (2011). The Degrees of Freedom of
Partial Least Squares Regression. Journal of the American Statistical
Association, 106(494), 697-705.
N. Kraemer, M. Sugiyama, M.L. Braun.
(2009). Lanczos Approximations for the Speedup of Kernel Partial Least
Squares Regression, Proceedings of the Twelfth International
Conference on Artificial Intelligence and Statistics (AISTATS), 272-279.
aic.dof and infcrit.dof for computing
information criteria directly from a previously fitted plsR model.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) plsR.dof(modpls) plsR.dof(modpls,naive=TRUE)data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,4) plsR.dof(modpls) plsR.dof(modpls,naive=TRUE)
This function implements Partial least squares Regression generalized linear models complete or incomplete datasets.
plsRglm(object, ...) ## Default S3 method: plsRglmmodel(object,dataX,nt=2,limQ2set=.0975, dataPredictY=dataX,modele="pls",family=NULL,typeVC="none", EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE, alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights, sparse=FALSE,sparseStop=TRUE,naive=FALSE,fit_backend="stats", verbose=TRUE,...) ## S3 method for class 'formula' plsRglmmodel(object,data=NULL,nt=2,limQ2set=.0975, dataPredictY,modele="pls",family=NULL,typeVC="none", EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE, alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights,subset, start=NULL,etastart,mustart,offset,method="glm.fit",control= list(), contrasts=NULL,sparse=FALSE,sparseStop=TRUE,naive=FALSE, fit_backend="stats",verbose=TRUE,...) PLS_glm(dataY, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, method, sparse = FALSE, sparseStop=FALSE, naive=FALSE, fit_backend="stats",verbose=TRUE) PLS_glm_formula(formula,data=NULL,nt=2,limQ2set=.0975,dataPredictY=dataX, modele="pls",family=NULL,typeVC="none",EstimXNA=FALSE,scaleX=TRUE, scaleY=NULL,pvals.expli=FALSE,alpha.pvals.expli=.05,MClassed=FALSE, tol_Xi=10^(-12),weights,subset,start=NULL,etastart,mustart,offset,method, control= list(),contrasts=NULL,sparse=FALSE,sparseStop=FALSE,naive=FALSE, fit_backend="stats",verbose=TRUE)plsRglm(object, ...) ## Default S3 method: plsRglmmodel(object,dataX,nt=2,limQ2set=.0975, dataPredictY=dataX,modele="pls",family=NULL,typeVC="none", EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE, alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights, sparse=FALSE,sparseStop=TRUE,naive=FALSE,fit_backend="stats", verbose=TRUE,...) ## S3 method for class 'formula' plsRglmmodel(object,data=NULL,nt=2,limQ2set=.0975, dataPredictY,modele="pls",family=NULL,typeVC="none", EstimXNA=FALSE,scaleX=TRUE,scaleY=NULL,pvals.expli=FALSE, alpha.pvals.expli=.05,MClassed=FALSE,tol_Xi=10^(-12),weights,subset, start=NULL,etastart,mustart,offset,method="glm.fit",control= list(), contrasts=NULL,sparse=FALSE,sparseStop=TRUE,naive=FALSE, fit_backend="stats",verbose=TRUE,...) PLS_glm(dataY, dataX, nt = 2, limQ2set = 0.0975, dataPredictY = dataX, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, method, sparse = FALSE, sparseStop=FALSE, naive=FALSE, fit_backend="stats",verbose=TRUE) PLS_glm_formula(formula,data=NULL,nt=2,limQ2set=.0975,dataPredictY=dataX, modele="pls",family=NULL,typeVC="none",EstimXNA=FALSE,scaleX=TRUE, scaleY=NULL,pvals.expli=FALSE,alpha.pvals.expli=.05,MClassed=FALSE, tol_Xi=10^(-12),weights,subset,start=NULL,etastart,mustart,offset,method, control= list(),contrasts=NULL,sparse=FALSE,sparseStop=FALSE,naive=FALSE, fit_backend="stats",verbose=TRUE)
object |
response (training) dataset or an object of class " |
dataY |
response (training) dataset |
dataX |
predictor(s) (training) dataset |
formula |
an object of class " |
data |
an optional data frame, list or environment (or object coercible by |
nt |
number of components to be extracted |
limQ2set |
limit value for the Q2 |
dataPredictY |
predictor(s) (testing) dataset |
modele |
name of the PLS glm model to be fitted ( |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See |
typeVC |
type of leave one out cross validation. For back compatibility purpose.
|
EstimXNA |
only for |
scaleX |
scale the predictor(s) : must be set to TRUE for |
scaleY |
scale the response : Yes/No. Ignored since non always possible for glm responses. |
pvals.expli |
should individual p-values be reported to tune model selection ? |
alpha.pvals.expli |
level of significance for predictors when pvals.expli=TRUE |
MClassed |
number of missclassified cases, should only be used for binary responses |
tol_Xi |
minimal value for Norm2(Xi) and |
weights |
an optional vector of 'prior weights' to be used in the fitting process. Should be |
fit_backend |
backend used for repeated non-ordinal score-space GLM fits. Use |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
start |
starting values for the parameters in the linear predictor. |
etastart |
starting values for the linear predictor. |
mustart |
starting values for the vector of means. |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be |
method |
For non-ordinal GLM modes this argument is kept for backward compatibility; use |
control |
a list of parameters for controlling the fitting process. For |
contrasts |
an optional list. See the |
sparse |
should the coefficients of non-significant predictors (< |
sparseStop |
should component extraction stop when no significant predictors (< |
naive |
Use the naive estimates for the Degrees of Freedom in plsR? Default is |
verbose |
Should details be displayed ? |
... |
arguments to pass to |
There are seven different predefined models with predefined link functions available :
"pls"ordinary pls models
"pls-glm-Gamma"glm gaussian with inverse link pls models
"pls-glm-gaussian"glm gaussian with identity link pls models
"pls-glm-inverse-gamma"glm binomial with square inverse link pls models
"pls-glm-logistic"glm binomial with logit link pls models
"pls-glm-poisson"glm poisson with log link pls models
"pls-glm-polr"glm polr with logit link pls models
Using the "family=" option and setting "modele=pls-glm-family" allows changing the family and link function the same way as for the glm function. As a consequence user-specified families can also be used.
gaussian familyaccepts the links (as names) identity, log and inverse.
binomial familyaccepts the links logit, probit, cauchit, (corresponding to logistic, normal and Cauchy CDFs respectively) log and cloglog (complementary log-log).
Gamma familyaccepts the links inverse, identity and log.
poisson familyaccepts the links log, identity, and sqrt.
inverse.gaussian familyaccepts the links 1/mu^2, inverse, identity and log.
quasi familyaccepts the links logit, probit, cloglog, identity, inverse, log, 1/mu^2 and sqrt.
power
can be used to create a power link function.
A typical predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with any duplicates removed.
A specification of the form first:second indicates the the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.
The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula.
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations.
The default estimator for Degrees of Freedom is the Kramer and Sugiyama's one which only works for classical plsR models. For these models, Information criteria are computed accordingly to these estimations. Naive Degrees of Freedom and Information Criteria are also provided for comparison purposes. For more details, see N. Kraemer and M. Sugiyama. (2011). The Degrees of Freedom of Partial Least Squares Regression. Journal of the American Statistical Association, 106(494), 697-705, 2011.
Depends on the model that was used to fit the model. You can generally at least find these items.
nr |
Number of observations |
nc |
Number of predictors |
nt |
Number of requested components |
ww |
raw weights (before L2-normalization) |
wwnorm |
L2 normed weights (to be used with deflated matrices of predictor variables) |
wwetoile |
modified weights (to be used with original matrix of predictor variables) |
tt |
PLS components |
pp |
loadings of the predictor variables |
CoeffC |
coefficients of the PLS components |
uscores |
scores of the response variable |
YChapeau |
predicted response values for the dataX set |
residYChapeau |
residuals of the deflated response on the standardized scale |
RepY |
scaled response vector |
na.miss.Y |
is there any NA value in the response vector |
YNA |
indicatrix vector of missing values in RepY |
residY |
deflated scaled response vector |
ExpliX |
scaled matrix of predictors |
na.miss.X |
is there any NA value in the predictor matrix |
XXNA |
indicator of non-NA values in the predictor matrix |
residXX |
deflated predictor matrix |
PredictY |
response values with NA replaced with 0 |
RSS |
residual sum of squares (original scale) |
RSSresidY |
residual sum of squares (scaled scale) |
R2residY |
R2 coefficient value on the standardized scale |
R2 |
R2 coefficient value on the original scale |
press.ind |
individual PRESS value for each observation (scaled scale) |
press.tot |
total PRESS value for all observations (scaled scale) |
Q2cum |
cumulated Q2 (standardized scale) |
family |
glm family used to fit PLSGLR model |
ttPredictY |
PLS components for the dataset on which prediction was requested |
typeVC |
type of leave one out cross-validation used |
dataX |
predictor values |
dataY |
response values |
weights |
weights of the observations |
fit_backend |
backend used for repeated non-ordinal score-space GLM fits |
computed_nt |
number of components that were computed |
AIC |
AIC vs number of components |
BIC |
BIC vs number of components |
Coeffsmodel_vals |
|
ChisqPearson |
|
CoeffCFull |
matrix of the coefficients of the predictors |
CoeffConstante |
value of the intercept (scaled scale) |
Std.Coeffs |
Vector of standardized regression coefficients |
Coeffs |
Vector of regression coefficients (used with the original data scale) |
Yresidus |
residuals of the PLS model |
residusY |
residuals of the deflated response on the standardized scale |
InfCrit |
table of Information Criteria:
|
Std.ValsPredictY |
predicted response values for supplementary dataset (standardized scale) |
ValsPredictY |
predicted response values for supplementary dataset (original scale) |
Std.XChapeau |
estimated values for missing values in the predictor matrix (standardized scale) |
FinalModel |
final GLR model on the PLS components |
XXwotNA |
predictor matrix with missing values replaced with 0 |
call |
call |
AIC.std |
AIC.std vs number of components (AIC computed for the standardized model |
Use cv.plsRglm to cross-validate the plsRglm models and bootplsglm to bootstrap them.
Frederic Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frederic Bertrand (2010). Comparaison de la regression PLS et de la regression logistique PLS : application aux donnees d'allelotypage. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
See also plsR.
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,10,modele="pls-glm-gaussian") #To retrieve the final GLR model on the PLS components finalmod <- modplsglm$FinalModel #It is a glm object. plot(finalmod) #Cross validation cv.modplsglm<-cv.plsRglm(Y~.,data=Cornell,6,NK=100,modele="pls-glm-gaussian", verbose=FALSE) res.cv.modplsglm<-cvtable(summary(cv.modplsglm)) plot(res.cv.modplsglm) #If no model specified, classic PLSR model modpls <- plsRglm(Y~.,data=Cornell,6) modpls modpls$tt modpls$uscores modpls$pp modpls$Coeffs #rm(list=c("XCornell","yCornell",modpls,cv.modplsglm,res.cv.modplsglm)) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y plsRglm(yaze_compl,Xaze_compl,nt=10,modele="pls",MClassed=TRUE, verbose=FALSE)$InfCrit modpls <- plsRglm(yaze_compl,Xaze_compl,nt=10,modele="pls-glm-logistic", MClassed=TRUE,pvals.expli=TRUE, verbose=FALSE) modpls colSums(modpls$pvalstep) modpls$Coeffsmodel_vals plot(plsRglm(yaze_compl,Xaze_compl,4,modele="pls-glm-logistic")$FinalModel) plsRglm(yaze_compl[-c(99,72)],Xaze_compl[-c(99,72),],4, modele="pls-glm-logistic",pvals.expli=TRUE)$pvalstep plot(plsRglm(yaze_compl[-c(99,72)],Xaze_compl[-c(99,72),],4, modele="pls-glm-logistic",pvals.expli=TRUE)$FinalModel) rm(list=c("Xaze_compl","yaze_compl","modpls")) data(bordeaux) Xbordeaux<-bordeaux[,1:4] ybordeaux<-factor(bordeaux$Quality,ordered=TRUE) modpls <- plsRglm(ybordeaux,Xbordeaux,10,modele="pls-glm-polr",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) XbordeauxNA<-Xbordeaux XbordeauxNA[1,1] <- NA modplsNA <- plsRglm(ybordeaux,XbordeauxNA,10,modele="pls-glm-polr",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) rm(list=c("Xbordeaux","XbordeauxNA","ybordeaux","modplsNA")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] modpls1 <- plsRglm(ypine,Xpine,1) modpls1$Std.Coeffs modpls1$Coeffs modpls4 <- plsRglm(ypine,Xpine,4) modpls4$Std.Coeffs modpls4$Coeffs modpls4$PredictY[1,] plsRglm(ypine,Xpine,4,dataPredictY=Xpine[1,])$PredictY[1,] XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA modpls4NA <- plsRglm(ypine,XpineNAX21,4) modpls4NA$Std.Coeffs modpls4NA$YChapeau[1,] modpls4$YChapeau[1,] modpls4NA$CoeffC plsRglm(ypine,XpineNAX21,4,EstimXNA=TRUE)$XChapeau plsRglm(ypine,XpineNAX21,4,EstimXNA=TRUE)$XChapeauNA # compare pls-glm-gaussian with classic plsR modplsglm4 <- plsRglm(ypine,Xpine,4,modele="pls-glm-gaussian") cbind(modpls4$Std.Coeffs,modplsglm4$Std.Coeffs) # without missing data cbind(ypine,modpls4$ValsPredictY,modplsglm4$ValsPredictY) # with missing data modplsglm4NA <- plsRglm(ypine,XpineNAX21,4,modele="pls-glm-gaussian") cbind((ypine),modpls4NA$ValsPredictY,modplsglm4NA$ValsPredictY) rm(list=c("Xpine","ypine","modpls4","modpls4NA","modplsglm4","modplsglm4NA")) data(fowlkes) Xfowlkes <- fowlkes[,2:13] yfowlkes <- fowlkes[,1] modpls <- plsRglm(yfowlkes,Xfowlkes,4,modele="pls-glm-logistic",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) rm(list=c("Xfowlkes","yfowlkes","modpls")) if(require(chemometrics)){ data(hyptis) yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) options(contrasts = c("contr.treatment", "contr.poly")) modpls2 <- plsRglm(yhyptis,Xhyptis,6,modele="pls-glm-polr") modpls2$Coeffsmodel_vals modpls2$InfCrit modpls2$Coeffs modpls2$Std.Coeffs table(yhyptis,predict(modpls2$FinalModel,type="class")) rm(list=c("yhyptis","Xhyptis","modpls2")) } dimX <- 24 Astar <- 6 dataAstar6 <- t(replicate(250,simul_data_UniYX(dimX,Astar))) ysimbin1 <- dicho(dataAstar6)[,1] Xsimbin1 <- dicho(dataAstar6)[,2:(dimX+1)] modplsglm <- plsRglm(ysimbin1,Xsimbin1,10,modele="pls-glm-logistic") modplsglm simbin=data.frame(dicho(dataAstar6)) cv.modplsglm <- suppressWarnings(cv.plsRglm(Y~.,data=simbin,nt=10, modele="pls-glm-logistic",NK=100, verbose=FALSE)) res.cv.modplsglm <- cvtable(summary(cv.modplsglm,MClassed=TRUE, verbose=FALSE)) plot(res.cv.modplsglm) #defaults to type="CVMC" rm(list=c("dimX","Astar","dataAstar6","ysimbin1","Xsimbin1","modplsglm","cv.modplsglm", "res.cv.modplsglm"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,10,modele="pls-glm-gaussian") #To retrieve the final GLR model on the PLS components finalmod <- modplsglm$FinalModel #It is a glm object. plot(finalmod) #Cross validation cv.modplsglm<-cv.plsRglm(Y~.,data=Cornell,6,NK=100,modele="pls-glm-gaussian", verbose=FALSE) res.cv.modplsglm<-cvtable(summary(cv.modplsglm)) plot(res.cv.modplsglm) #If no model specified, classic PLSR model modpls <- plsRglm(Y~.,data=Cornell,6) modpls modpls$tt modpls$uscores modpls$pp modpls$Coeffs #rm(list=c("XCornell","yCornell",modpls,cv.modplsglm,res.cv.modplsglm)) data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y plsRglm(yaze_compl,Xaze_compl,nt=10,modele="pls",MClassed=TRUE, verbose=FALSE)$InfCrit modpls <- plsRglm(yaze_compl,Xaze_compl,nt=10,modele="pls-glm-logistic", MClassed=TRUE,pvals.expli=TRUE, verbose=FALSE) modpls colSums(modpls$pvalstep) modpls$Coeffsmodel_vals plot(plsRglm(yaze_compl,Xaze_compl,4,modele="pls-glm-logistic")$FinalModel) plsRglm(yaze_compl[-c(99,72)],Xaze_compl[-c(99,72),],4, modele="pls-glm-logistic",pvals.expli=TRUE)$pvalstep plot(plsRglm(yaze_compl[-c(99,72)],Xaze_compl[-c(99,72),],4, modele="pls-glm-logistic",pvals.expli=TRUE)$FinalModel) rm(list=c("Xaze_compl","yaze_compl","modpls")) data(bordeaux) Xbordeaux<-bordeaux[,1:4] ybordeaux<-factor(bordeaux$Quality,ordered=TRUE) modpls <- plsRglm(ybordeaux,Xbordeaux,10,modele="pls-glm-polr",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) XbordeauxNA<-Xbordeaux XbordeauxNA[1,1] <- NA modplsNA <- plsRglm(ybordeaux,XbordeauxNA,10,modele="pls-glm-polr",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) rm(list=c("Xbordeaux","XbordeauxNA","ybordeaux","modplsNA")) data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] modpls1 <- plsRglm(ypine,Xpine,1) modpls1$Std.Coeffs modpls1$Coeffs modpls4 <- plsRglm(ypine,Xpine,4) modpls4$Std.Coeffs modpls4$Coeffs modpls4$PredictY[1,] plsRglm(ypine,Xpine,4,dataPredictY=Xpine[1,])$PredictY[1,] XpineNAX21 <- Xpine XpineNAX21[1,2] <- NA modpls4NA <- plsRglm(ypine,XpineNAX21,4) modpls4NA$Std.Coeffs modpls4NA$YChapeau[1,] modpls4$YChapeau[1,] modpls4NA$CoeffC plsRglm(ypine,XpineNAX21,4,EstimXNA=TRUE)$XChapeau plsRglm(ypine,XpineNAX21,4,EstimXNA=TRUE)$XChapeauNA # compare pls-glm-gaussian with classic plsR modplsglm4 <- plsRglm(ypine,Xpine,4,modele="pls-glm-gaussian") cbind(modpls4$Std.Coeffs,modplsglm4$Std.Coeffs) # without missing data cbind(ypine,modpls4$ValsPredictY,modplsglm4$ValsPredictY) # with missing data modplsglm4NA <- plsRglm(ypine,XpineNAX21,4,modele="pls-glm-gaussian") cbind((ypine),modpls4NA$ValsPredictY,modplsglm4NA$ValsPredictY) rm(list=c("Xpine","ypine","modpls4","modpls4NA","modplsglm4","modplsglm4NA")) data(fowlkes) Xfowlkes <- fowlkes[,2:13] yfowlkes <- fowlkes[,1] modpls <- plsRglm(yfowlkes,Xfowlkes,4,modele="pls-glm-logistic",pvals.expli=TRUE) modpls colSums(modpls$pvalstep) rm(list=c("Xfowlkes","yfowlkes","modpls")) if(require(chemometrics)){ data(hyptis) yhyptis <- factor(hyptis$Group,ordered=TRUE) Xhyptis <- as.data.frame(hyptis[,c(1:6)]) options(contrasts = c("contr.treatment", "contr.poly")) modpls2 <- plsRglm(yhyptis,Xhyptis,6,modele="pls-glm-polr") modpls2$Coeffsmodel_vals modpls2$InfCrit modpls2$Coeffs modpls2$Std.Coeffs table(yhyptis,predict(modpls2$FinalModel,type="class")) rm(list=c("yhyptis","Xhyptis","modpls2")) } dimX <- 24 Astar <- 6 dataAstar6 <- t(replicate(250,simul_data_UniYX(dimX,Astar))) ysimbin1 <- dicho(dataAstar6)[,1] Xsimbin1 <- dicho(dataAstar6)[,2:(dimX+1)] modplsglm <- plsRglm(ysimbin1,Xsimbin1,10,modele="pls-glm-logistic") modplsglm simbin=data.frame(dicho(dataAstar6)) cv.modplsglm <- suppressWarnings(cv.plsRglm(Y~.,data=simbin,nt=10, modele="pls-glm-logistic",NK=100, verbose=FALSE)) res.cv.modplsglm <- cvtable(summary(cv.modplsglm,MClassed=TRUE, verbose=FALSE)) plot(res.cv.modplsglm) #defaults to type="CVMC" rm(list=c("dimX","Astar","dataAstar6","ysimbin1","Xsimbin1","modplsglm","cv.modplsglm", "res.cv.modplsglm"))
plsRmulti() implements an experimental complete-case linear PLS2 fit for
multivariate numeric responses. It is intentionally separate from
plsR so the current PLS1 API remains unchanged.
plsRmulti(object, ...) ## Default S3 method: plsRmultiModel( object, dataX, nt = 2, limQ2set = 0.0975, dataPredictY, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... ) ## S3 method for class 'formula' plsRmultiModel( object, data, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, subset, contrasts = NULL, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... )plsRmulti(object, ...) ## Default S3 method: plsRmultiModel( object, dataX, nt = 2, limQ2set = 0.0975, dataPredictY, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... ) ## S3 method for class 'formula' plsRmultiModel( object, data, nt = 2, limQ2set = 0.0975, modele = "pls", family = NULL, typeVC = "none", EstimXNA = FALSE, scaleX = TRUE, scaleY = NULL, pvals.expli = FALSE, alpha.pvals.expli = 0.05, MClassed = FALSE, tol_Xi = 10^(-12), weights, subset, contrasts = NULL, sparse = FALSE, sparseStop = FALSE, naive = FALSE, verbose = TRUE, ... )
object |
For the default method, a numeric multivariate response matrix
or data frame with at least two columns. For the formula method, a formula of
the form |
... |
Not used. Extra arguments are rejected in this experimental release. |
dataX |
Numeric predictor matrix or data frame. |
nt |
Number of components to extract. |
limQ2set |
Kept for interface compatibility. Not supported by
|
dataPredictY |
Kept for interface compatibility. Not supported by
|
modele |
Only |
family |
Not supported in this experimental release. |
typeVC |
Only |
EstimXNA |
Not supported in this experimental release. |
scaleX |
Should predictors be scaled? |
scaleY |
Should responses be scaled? Defaults to |
pvals.expli |
Not supported in this experimental release. |
alpha.pvals.expli |
Not supported in this experimental release. |
MClassed |
Not supported in this experimental release. |
tol_Xi |
Tolerance used for degeneracy checks during component extraction. |
weights |
Not supported in this experimental release. |
sparse |
Not supported in this experimental release. |
sparseStop |
Not supported in this experimental release. |
naive |
Not supported in this experimental release. |
verbose |
Should informational messages be displayed? |
data |
An optional data frame for the formula method. |
subset |
An optional subset for the formula method. |
contrasts |
Optional contrasts for the formula method. |
This experimental release supports complete-case linear PLS2 fitting,
prediction, repeated k-fold cross-validation via cv.plsRmulti,
and bootstrap resampling via bootpls. It still does not support
missing values, weights, sparse extraction, classification diagnostics, or GLM
families.
An object of class "plsRmultiModel" with multivariate analogues of the
linear plsR outputs, including the extracted scores tt, X
loadings pp, response score coefficients CoeffC, coefficient
matrix Coeffs, intercept vector CoeffConstante, scaled response
matrix RepY, and fitted response matrices YChapeau,
Std.ValsPredictY, and ValsPredictY.
predict.plsRmultiModel, cv.plsRmulti,
bootpls, plsR
set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) fit <- plsRmulti(Y, X, nt = 2, verbose = FALSE) fit head(predict(fit))set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) fit <- plsRmulti(Y, X, nt = 2, verbose = FALSE) fit head(predict(fit))
This function provides a predict method for the class "plsRglmmodel"
## S3 method for class 'plsRglmmodel' predict( object, newdata, comps = object$computed_nt, type = c("link", "response", "terms", "scores", "class", "probs"), se.fit = FALSE, weights, dispersion = NULL, methodNA = "adaptative", verbose = TRUE, ... )## S3 method for class 'plsRglmmodel' predict( object, newdata, comps = object$computed_nt, type = c("link", "response", "terms", "scores", "class", "probs"), se.fit = FALSE, weights, dispersion = NULL, methodNA = "adaptative", verbose = TRUE, ... )
object |
An object of the class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
comps |
A value with a single value of component to use for prediction. |
type |
Type of predicted value. Available choices are the glms ones
(" |
se.fit |
If TRUE, pointwise standard errors are produced for the predictions using the Cox model. |
weights |
Vector of case weights. If |
dispersion |
the dispersion of the GLM fit to be assumed in computing the standard errors. If omitted, that returned by summary applied to the object is used. |
methodNA |
Selects the way of predicting the response or the scores of
the new data. For complete rows, without any missing value, there are two
different ways of computing the prediction. As a consequence, for mixed
datasets, with complete and incomplete rows, there are two ways of computing
prediction : either predicts any row as if there were missing values in it
( |
verbose |
should info messages be displayed ? |
... |
Arguments to be passed on to |
When type is "response", a matrix of predicted response
values is returned.
When type is "scores", a score matrix is
returned.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
See Also predict.glm
data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] data(pine_sup) Xpine_sup<-pine_sup[,1:10] Xpine_supNA<-Xpine_sup Xpine_supNA[1,1]<-NA modpls=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family",family="gaussian", verbose=FALSE) modplsform=plsRglm(x11~.,data=pine,nt=6,modele="pls-glm-family",family="gaussian", verbose=FALSE) modpls2=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family", dataPredictY=Xpine_sup,family="gaussian", verbose=FALSE) modpls2NA=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family", dataPredictY=Xpine_supNA,family="gaussian", verbose=FALSE) #Identical to predict(modpls,type="link") or modpls$Std.ValsPredictY cbind(modpls$Std.ValsPredictY,modplsform$Std.ValsPredictY, predict(modpls),predict(modplsform)) #Identical to predict(modpls,type="response") or modpls$ValsPredictY cbind(modpls$ValsPredictY,modplsform$ValsPredictY, predict(modpls,type="response"),predict(modplsform,type="response")) #Identical to modpls$ttPredictY predict(modpls,type="scores") predict(modplsform,type="scores") #Identical to modpls2$ValsPredictY cbind(predict(modpls,newdata=Xpine_sup,type="response"), predict(modplsform,newdata=Xpine_sup,type="response")) #Select the number of components to use to derive the prediction predict(modpls,newdata=Xpine_sup,type="response",comps=1) predict(modpls,newdata=Xpine_sup,type="response",comps=3) predict(modpls,newdata=Xpine_sup,type="response",comps=6) try(predict(modpls,newdata=Xpine_sup,type="response",comps=8)) #Identical to modpls2$ttValsPredictY predict(modpls,newdata=Xpine_sup,type="scores") #Select the number of components in the scores matrix predict(modpls,newdata=Xpine_sup,type="scores",comps=1) predict(modpls,newdata=Xpine_sup,type="scores",comps=3) predict(modpls,newdata=Xpine_sup,type="scores",comps=6) try(predict(modpls,newdata=Xpine_sup,type="scores",comps=8)) #Identical to modpls2NA$ValsPredictY predict(modpls,newdata=Xpine_supNA,type="response",methodNA="missingdata") cbind(predict(modpls,newdata=Xpine_supNA,type="response"), predict(modplsform,newdata=Xpine_supNA,type="response")) predict(modpls,newdata=Xpine_supNA,type="response",comps=1) predict(modpls,newdata=Xpine_supNA,type="response",comps=3) predict(modpls,newdata=Xpine_supNA,type="response",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="response",comps=8)) #Identical to modpls2NA$ttPredictY predict(modpls,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modplsform,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modpls,newdata=Xpine_supNA,type="scores") predict(modplsform,newdata=Xpine_supNA,type="scores") predict(modpls,newdata=Xpine_supNA,type="scores",comps=1) predict(modpls,newdata=Xpine_supNA,type="scores",comps=3) predict(modpls,newdata=Xpine_supNA,type="scores",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="scores",comps=8))data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] data(pine_sup) Xpine_sup<-pine_sup[,1:10] Xpine_supNA<-Xpine_sup Xpine_supNA[1,1]<-NA modpls=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family",family="gaussian", verbose=FALSE) modplsform=plsRglm(x11~.,data=pine,nt=6,modele="pls-glm-family",family="gaussian", verbose=FALSE) modpls2=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family", dataPredictY=Xpine_sup,family="gaussian", verbose=FALSE) modpls2NA=plsRglm(object=ypine,dataX=Xpine,nt=6,modele="pls-glm-family", dataPredictY=Xpine_supNA,family="gaussian", verbose=FALSE) #Identical to predict(modpls,type="link") or modpls$Std.ValsPredictY cbind(modpls$Std.ValsPredictY,modplsform$Std.ValsPredictY, predict(modpls),predict(modplsform)) #Identical to predict(modpls,type="response") or modpls$ValsPredictY cbind(modpls$ValsPredictY,modplsform$ValsPredictY, predict(modpls,type="response"),predict(modplsform,type="response")) #Identical to modpls$ttPredictY predict(modpls,type="scores") predict(modplsform,type="scores") #Identical to modpls2$ValsPredictY cbind(predict(modpls,newdata=Xpine_sup,type="response"), predict(modplsform,newdata=Xpine_sup,type="response")) #Select the number of components to use to derive the prediction predict(modpls,newdata=Xpine_sup,type="response",comps=1) predict(modpls,newdata=Xpine_sup,type="response",comps=3) predict(modpls,newdata=Xpine_sup,type="response",comps=6) try(predict(modpls,newdata=Xpine_sup,type="response",comps=8)) #Identical to modpls2$ttValsPredictY predict(modpls,newdata=Xpine_sup,type="scores") #Select the number of components in the scores matrix predict(modpls,newdata=Xpine_sup,type="scores",comps=1) predict(modpls,newdata=Xpine_sup,type="scores",comps=3) predict(modpls,newdata=Xpine_sup,type="scores",comps=6) try(predict(modpls,newdata=Xpine_sup,type="scores",comps=8)) #Identical to modpls2NA$ValsPredictY predict(modpls,newdata=Xpine_supNA,type="response",methodNA="missingdata") cbind(predict(modpls,newdata=Xpine_supNA,type="response"), predict(modplsform,newdata=Xpine_supNA,type="response")) predict(modpls,newdata=Xpine_supNA,type="response",comps=1) predict(modpls,newdata=Xpine_supNA,type="response",comps=3) predict(modpls,newdata=Xpine_supNA,type="response",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="response",comps=8)) #Identical to modpls2NA$ttPredictY predict(modpls,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modplsform,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modpls,newdata=Xpine_supNA,type="scores") predict(modplsform,newdata=Xpine_supNA,type="scores") predict(modpls,newdata=Xpine_supNA,type="scores",comps=1) predict(modpls,newdata=Xpine_supNA,type="scores",comps=3) predict(modpls,newdata=Xpine_supNA,type="scores",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="scores",comps=8))
This function provides a predict method for the class "plsRmodel"
## S3 method for class 'plsRmodel' predict( object, newdata, comps = object$computed_nt, type = c("response", "scores"), weights, methodNA = "adaptative", verbose = TRUE, ... )## S3 method for class 'plsRmodel' predict( object, newdata, comps = object$computed_nt, type = c("response", "scores"), weights, methodNA = "adaptative", verbose = TRUE, ... )
object |
An object of the class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
comps |
A value with a single value of component to use for prediction. |
type |
Type of predicted value. Available choices are the response
values (" |
weights |
Vector of case weights. If |
methodNA |
Selects the way of predicting the response or the scores of
the new data. For complete rows, without any missing value, there are two
different ways of computing the prediction. As a consequence, for mixed
datasets, with complete and incomplete rows, there are two ways of computing
prediction : either predicts any row as if there were missing values in it
( |
verbose |
should info messages be displayed ? |
... |
Arguments to be passed on to |
When type is "response", a matrix of predicted response
values is returned.
When type is "scores", a score matrix is
returned.
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] data(pine_sup) Xpine_sup<-pine_sup[,1:10] Xpine_supNA<-Xpine_sup Xpine_supNA[1,1]<-NA modpls=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls", verbose=FALSE) modplsform=plsR(x11~.,data=pine,nt=6,modele="pls", verbose=FALSE) modpls2=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls",dataPredictY=Xpine_sup, verbose=FALSE) modpls2NA=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls",dataPredictY=Xpine_supNA, verbose=FALSE) #Identical to predict(modpls,type="response") or modpls$ValsPredictY cbind(predict(modpls),predict(modplsform)) #Identical to modpls$ttPredictY predict(modpls,type="scores") predict(modplsform,type="scores") #Identical to modpls2$ValsPredictY cbind(predict(modpls,newdata=Xpine_sup,type="response"), predict(modplsform,newdata=Xpine_sup,type="response")) #Select the number of components to use to derive the prediction predict(modpls,newdata=Xpine_sup,type="response",comps=1) predict(modpls,newdata=Xpine_sup,type="response",comps=3) predict(modpls,newdata=Xpine_sup,type="response",comps=6) try(predict(modpls,newdata=Xpine_sup,type="response",comps=8)) #Identical to modpls2$ttValsPredictY predict(modpls,newdata=Xpine_sup,type="scores") #Select the number of components in the scores matrix predict(modpls,newdata=Xpine_sup,type="scores",comps=1) predict(modpls,newdata=Xpine_sup,type="scores",comps=3) predict(modpls,newdata=Xpine_sup,type="scores",comps=6) try(predict(modpls,newdata=Xpine_sup,type="scores",comps=8)) #Identical to modpls2NA$ValsPredictY predict(modpls,newdata=Xpine_supNA,type="response",methodNA="missingdata") cbind(predict(modpls,newdata=Xpine_supNA,type="response"), predict(modplsform,newdata=Xpine_supNA,type="response")) predict(modpls,newdata=Xpine_supNA,type="response",comps=1) predict(modpls,newdata=Xpine_supNA,type="response",comps=3) predict(modpls,newdata=Xpine_supNA,type="response",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="response",comps=8)) #Identical to modpls2NA$ttPredictY predict(modpls,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modplsform,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modpls,newdata=Xpine_supNA,type="scores") predict(modplsform,newdata=Xpine_supNA,type="scores") predict(modpls,newdata=Xpine_supNA,type="scores",comps=1) predict(modpls,newdata=Xpine_supNA,type="scores",comps=3) predict(modpls,newdata=Xpine_supNA,type="scores",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="scores",comps=8))data(pine) Xpine<-pine[,1:10] ypine<-pine[,11] data(pine_sup) Xpine_sup<-pine_sup[,1:10] Xpine_supNA<-Xpine_sup Xpine_supNA[1,1]<-NA modpls=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls", verbose=FALSE) modplsform=plsR(x11~.,data=pine,nt=6,modele="pls", verbose=FALSE) modpls2=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls",dataPredictY=Xpine_sup, verbose=FALSE) modpls2NA=plsR(object=ypine,dataX=Xpine,nt=6,modele="pls",dataPredictY=Xpine_supNA, verbose=FALSE) #Identical to predict(modpls,type="response") or modpls$ValsPredictY cbind(predict(modpls),predict(modplsform)) #Identical to modpls$ttPredictY predict(modpls,type="scores") predict(modplsform,type="scores") #Identical to modpls2$ValsPredictY cbind(predict(modpls,newdata=Xpine_sup,type="response"), predict(modplsform,newdata=Xpine_sup,type="response")) #Select the number of components to use to derive the prediction predict(modpls,newdata=Xpine_sup,type="response",comps=1) predict(modpls,newdata=Xpine_sup,type="response",comps=3) predict(modpls,newdata=Xpine_sup,type="response",comps=6) try(predict(modpls,newdata=Xpine_sup,type="response",comps=8)) #Identical to modpls2$ttValsPredictY predict(modpls,newdata=Xpine_sup,type="scores") #Select the number of components in the scores matrix predict(modpls,newdata=Xpine_sup,type="scores",comps=1) predict(modpls,newdata=Xpine_sup,type="scores",comps=3) predict(modpls,newdata=Xpine_sup,type="scores",comps=6) try(predict(modpls,newdata=Xpine_sup,type="scores",comps=8)) #Identical to modpls2NA$ValsPredictY predict(modpls,newdata=Xpine_supNA,type="response",methodNA="missingdata") cbind(predict(modpls,newdata=Xpine_supNA,type="response"), predict(modplsform,newdata=Xpine_supNA,type="response")) predict(modpls,newdata=Xpine_supNA,type="response",comps=1) predict(modpls,newdata=Xpine_supNA,type="response",comps=3) predict(modpls,newdata=Xpine_supNA,type="response",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="response",comps=8)) #Identical to modpls2NA$ttPredictY predict(modpls,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modplsform,newdata=Xpine_supNA,type="scores",methodNA="missingdata") predict(modpls,newdata=Xpine_supNA,type="scores") predict(modplsform,newdata=Xpine_supNA,type="scores") predict(modpls,newdata=Xpine_supNA,type="scores",comps=1) predict(modpls,newdata=Xpine_supNA,type="scores",comps=3) predict(modpls,newdata=Xpine_supNA,type="scores",comps=6) try(predict(modpls,newdata=Xpine_supNA,type="scores",comps=8))
Prediction method for "plsRmultiModel" objects.
## S3 method for class 'plsRmultiModel' predict( object, newdata, comps = object$computed_nt, type = c("response", "scores"), verbose = TRUE, ... )## S3 method for class 'plsRmultiModel' predict( object, newdata, comps = object$computed_nt, type = c("response", "scores"), verbose = TRUE, ... )
object |
An object of class |
newdata |
Optional predictor matrix or data frame. For formula-fitted models, a data frame with the predictor variables used at fit time. |
comps |
Number of extracted components to use. |
type |
Either |
verbose |
Should informational messages be displayed? |
... |
Not used. |
If type = "response", a matrix of predicted responses. If
type = "scores", a matrix of latent score coordinates.
set.seed(123) X <- matrix(rnorm(40 * 3), ncol = 3) Y <- cbind( y1 = X[, 1] + rnorm(40, sd = 0.1), y2 = X[, 2] - X[, 3] + rnorm(40, sd = 0.1) ) fit <- plsRmulti(Y, X, nt = 2, verbose = FALSE) predict(fit, type = "response") predict(fit, type = "scores")set.seed(123) X <- matrix(rnorm(40 * 3), ncol = 3) Y <- cbind( y1 = X[, 1] + rnorm(40, sd = 0.1), y2 = X[, 2] - X[, 3] + rnorm(40, sd = 0.1) ) fit <- plsRmulti(Y, X, nt = 2, verbose = FALSE) predict(fit, type = "response") predict(fit, type = "scores")
This function provides a print method for the class
"coef.plsRglmmodel"
## S3 method for class 'coef.plsRglmmodel' print(x, ...)## S3 method for class 'coef.plsRglmmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-family",family=gaussian()) class(modplsglm) print(coef(modplsglm)) rm(list=c("XCornell","yCornell","modplsglm"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-family",family=gaussian()) class(modplsglm) print(coef(modplsglm)) rm(list=c("XCornell","yCornell","modplsglm"))
This function provides a print method for the class "coef.plsRmodel"
## S3 method for class 'coef.plsRmodel' print(x, ...)## S3 method for class 'coef.plsRmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(coef(modpls)) rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(coef(modpls)) rm(list=c("XCornell","yCornell","modpls"))
This function provides a print method for the class "cv.plsRglmmodel"
## S3 method for class 'cv.plsRglmmodel' print(x, ...)## S3 method for class 'cv.plsRglmmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparaison de la régression PLS et de la régression logistique PLS : application aux données d'allélotypage. Journal de la Société Française de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] print(cv.plsRglm(object=yCornell,dataX=XCornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] print(cv.plsRglm(object=yCornell,dataX=XCornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))
This function provides a print method for the class "cv.plsRmodel"
## S3 method for class 'cv.plsRmodel' print(x, ...)## S3 method for class 'cv.plsRmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] print(cv.plsR(object=yCornell,dataX=XCornell,nt=10,K=6, verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] print(cv.plsR(object=yCornell,dataX=XCornell,nt=10,K=6, verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))
This function provides a print method for the class "plsRglmmodel"
## S3 method for class 'plsRglmmodel' print(x, ...)## S3 method for class 'plsRglmmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparaison de la régression PLS et de la régression logistique PLS : application aux données d'allélotypage. Journal de la Société Française de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) print(modplsglm) rm(list=c("XCornell","yCornell","modplsglm"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) print(modplsglm) rm(list=c("XCornell","yCornell","modplsglm"))
This function provides a print method for the class "plsRmodel"
## S3 method for class 'plsRmodel' print(x, ...)## S3 method for class 'plsRmodel' print(x, ...)
x |
an object of the class |
... |
not used |
NULL
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparaison de la régression PLS et de la régression logistique PLS : application aux données d'allélotypage. Journal de la Société Française de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(modpls) rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(modpls) rm(list=c("XCornell","yCornell","modpls"))
This function provides a print method for the class
"summary.plsRglmmodel"
## S3 method for class 'summary.plsRglmmodel' print(x, ...)## S3 method for class 'summary.plsRglmmodel' print(x, ...)
x |
an object of the class |
... |
not used |
language |
call of the model |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparaison de la régression PLS et de la régression logistique PLS : application aux données d'allélotypage. Journal de la Société Française de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) print(summary(modplsglm)) rm(list=c("XCornell","yCornell","modplsglm"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) print(summary(modplsglm)) rm(list=c("XCornell","yCornell","modplsglm"))
This function provides a print method for the class
"summary.plsRmodel"
## S3 method for class 'summary.plsRmodel' print(x, ...)## S3 method for class 'summary.plsRmodel' print(x, ...)
x |
an object of the class |
... |
not used |
language |
call of the model |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparaison de la régression PLS et de la régression logistique PLS : application aux données d'allélotypage. Journal de la Société Française de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(summary(modpls)) rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsRglm(yCornell,XCornell,3,modele="pls") class(modpls) print(summary(modpls)) rm(list=c("XCornell","yCornell","modpls"))
This fonctions plots, for each of the model, the
signpred( matbin, pred.lablength = max(sapply(rownames(matbin), nchar)), labsize = 1, plotsize = 12 )signpred( matbin, pred.lablength = max(sapply(rownames(matbin), nchar)), labsize = 1, plotsize = 12 )
matbin |
Matrix with 0 or 1 entries. Each row per predictor and a column for every model. 0 means the predictor is not significant in the model and 1 that, on the contrary, it is significant. |
pred.lablength |
Maximum length of the predictors labels. Defaults to full label length. |
labsize |
Size of the predictors labels. |
plotsize |
Global size of the graph. |
This function is based on the visweb function from
the bipartite package.
A plot window.
Bernd Gruber with minor modifications from
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Vazquez, P.D., Chacoff, N.,P. and Cagnolo, L. (2009) Evaluating multiple determinants of the structure of plant-animal mutualistic networks. Ecology, 90:2039-2046.
See Also visweb
signpred(matrix(rbinom(160,1,.2),ncol=8,dimnames=list(as.character(1:20),as.character(1:8))))signpred(matrix(rbinom(160,1,.2),ncol=8,dimnames=list(as.character(1:20),as.character(1:8))))
This function generates a single multivariate response value
and a vector of explinatory variables
drawn from a model with a given number of
latent components.
simul_data_complete(totdim, ncomp)simul_data_complete(totdim, ncomp)
totdim |
Number of columns of the X vector (from |
ncomp |
Number of latent components in the model (from 2 to 6) |
This function should be combined with the replicate function to give rise to a larger dataset. The algorithm used is a port of the one described in the article of Li which is a multivariate generalization of the algorithm of Naes and Martens.
simX |
Vector of explanatory variables |
HH |
Dimension of
the response |
eta |
See Li et al. |
r |
See Li et al. |
epsilon |
See Li et al. |
ksi |
See Li et al. |
f |
See Li et al. |
z |
See Li et al. |
Y |
See Li et al. |
The value of depends on the value of ncomp :
ncomp |
|
| 2 | 3 |
| 3 | 3 |
| 4 | 4 |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
T. Naes, H. Martens, Comparison of prediction methods for
multicollinear data, Commun. Stat., Simul. 14 (1985) 545-576.
Morris, Elaine B. Martin, Model selection for partial least squares
regression, Chemometrics and Intelligent Laboratory
Systems 64 (2002) 79-89, doi:10.1016/S0169-7439(02)00051-5.
simul_data_YX for data simulation purpose
simul_data_complete(20,6) dimX <- 6 Astar <- 2 simul_data_complete(dimX,Astar) dimX <- 6 Astar <- 3 simul_data_complete(dimX,Astar) dimX <- 6 Astar <- 4 simul_data_complete(dimX,Astar) rm(list=c("dimX","Astar"))simul_data_complete(20,6) dimX <- 6 Astar <- 2 simul_data_complete(dimX,Astar) dimX <- 6 Astar <- 3 simul_data_complete(dimX,Astar) dimX <- 6 Astar <- 4 simul_data_complete(dimX,Astar) rm(list=c("dimX","Astar"))
This function generates a single univariate response value and a
vector of explanatory variables drawn from a
model with a given number of latent components.
simul_data_UniYX(totdim, ncomp)simul_data_UniYX(totdim, ncomp)
totdim |
Number of columns of the X vector (from |
ncomp |
Number of latent components in the model (from 2 to 6) |
This function should be combined with the replicate function to give rise to a larger dataset. The algorithm used is a port of the one described in the article of Li which is a multivariate generalization of the algorithm of Naes and Martens.
vector |
|
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
T. Naes, H. Martens, Comparison of prediction methods for
multicollinear data, Commun. Stat., Simul. 14 (1985) 545-576.
Morris, Elaine B. Martin, Model selection for partial least squares
regression, Chemometrics and Intelligent Laboratory Systems 64 (2002)
79-89, doi:10.1016/S0169-7439(02)00051-5.
simul_data_YX and simul_data_complete
for generating multivariate data
simul_data_UniYX(20,6) dimX <- 6 Astar <- 2 simul_data_UniYX(dimX,Astar) (dataAstar2 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar2,5,NK=100, verbose=FALSE))) dimX <- 6 Astar <- 3 simul_data_UniYX(dimX,Astar) (dataAstar3 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar3,5,NK=100, verbose=FALSE))) dimX <- 6 Astar <- 4 simul_data_UniYX(dimX,Astar) (dataAstar4 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar4,5,NK=100, verbose=FALSE))) rm(list=c("dimX","Astar","dataAstar2","dataAstar3","dataAstar4"))simul_data_UniYX(20,6) dimX <- 6 Astar <- 2 simul_data_UniYX(dimX,Astar) (dataAstar2 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar2,5,NK=100, verbose=FALSE))) dimX <- 6 Astar <- 3 simul_data_UniYX(dimX,Astar) (dataAstar3 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar3,5,NK=100, verbose=FALSE))) dimX <- 6 Astar <- 4 simul_data_UniYX(dimX,Astar) (dataAstar4 <- data.frame(t(replicate(50,simul_data_UniYX(dimX,Astar))))) cvtable(summary(cv.plsR(Y~.,data=dataAstar4,5,NK=100, verbose=FALSE))) rm(list=c("dimX","Astar","dataAstar2","dataAstar3","dataAstar4"))
This function generates a single univariate binomial response value
and a vector of explanatory variables drawn
from a model with a given number of latent components.
simul_data_UniYX_binom(totdim, ncomp, link = "logit", offset = 0)simul_data_UniYX_binom(totdim, ncomp, link = "logit", offset = 0)
totdim |
Number of columns of the X vector (from |
ncomp |
Number of latent components in the model (from 2 to 6) |
link |
Character specification of the link function in the mean model
(mu). Currently, " |
offset |
Offset on the linear scale |
This function should be combined with the replicate function to give rise to a larger dataset. The algorithm used is a modification of a port of the one described in the article of Li which is a multivariate generalization of the algorithm of Naes and Martens.
vector |
|
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
T. Naes, H. Martens, Comparison of prediction methods for
multicollinear data, Commun. Stat., Simul. 14 (1985) 545-576.
Morris, Elaine B. Martin, Model selection for partial least squares
regression, Chemometrics and Intelligent Laboratory Systems 64 (2002),
79-89, doi:10.1016/S0169-7439(02)00051-5.
layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit")))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog")))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit")))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog")))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log")))[,1]) layout(1) layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,offset=5)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit",offset=5)))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog",offset=5)))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit",offset=5)))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog",offset=5)))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log",offset=5)))[,1]) layout(1) layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,offset=-5)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit",offset=-5)))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog",offset=-5)))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit",offset=-5)))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog",offset=-5)))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log",offset=-5)))[,1]) layout(1)layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit")))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog")))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit")))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog")))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log")))[,1]) layout(1) layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,offset=5)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit",offset=5)))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog",offset=5)))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit",offset=5)))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog",offset=5)))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log",offset=5)))[,1]) layout(1) layout(matrix(1:6,nrow=2)) # logit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,offset=-5)))[,1]) # probit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="probit",offset=-5)))[,1]) # cloglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cloglog",offset=-5)))[,1]) # cauchit link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="cauchit",offset=-5)))[,1]) # loglog link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="loglog",offset=-5)))[,1]) # log link hist(t(replicate(100,simul_data_UniYX_binom(4,4,link="log",offset=-5)))[,1]) layout(1)
This function generates a single multivariate response value
and a vector of explinatory variables
drawn from a model with a given number of
latent components.
simul_data_YX(totdim, ncomp)simul_data_YX(totdim, ncomp)
totdim |
Number of column of the X vector (from |
ncomp |
Number of latent components in the model (from 2 to 6) |
This function should be combined with the replicate function to give rise to a larger dataset. The algorithm used is a port of the one described in the article of Li which is a multivariate generalization of the algorithm of Naes and Martens.
vector |
|
The value of depends on the value of ncomp :
ncomp |
|
| 2 | 3 |
| 3 | 3 |
| 4 | 4 |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
T. Naes, H. Martens, Comparison of prediction methods for
multicollinear data, Commun. Stat., Simul. 14 (1985) 545-576.
Morris, Elaine B. Martin, Model selection for partial least squares
regression, Chemometrics and Intelligent Laboratory Systems 64 (2002)
79-89, doi:10.1016/S0169-7439(02)00051-5.
simul_data_complete for highlighting the simulations
parameters
simul_data_YX(20,6) if(require(plsdepot)){ dimX <- 6 Astar <- 2 (dataAstar2 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar2 <- plsreg2(dataAstar2[,4:9],dataAstar2[,1:3],comps=5) resAstar2$Q2 resAstar2$Q2[,4]>0.0975 dimX <- 6 Astar <- 3 (dataAstar3 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar3 <- plsreg2(dataAstar3[,4:9],dataAstar3[,1:3],comps=5) resAstar3$Q2 resAstar3$Q2[,4]>0.0975 dimX <- 6 Astar <- 4 (dataAstar4 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar4 <- plsreg2(dataAstar4[,5:10],dataAstar4[,1:4],comps=5) resAstar4$Q2 resAstar4$Q2[,5]>0.0975 rm(list=c("dimX","Astar")) }simul_data_YX(20,6) if(require(plsdepot)){ dimX <- 6 Astar <- 2 (dataAstar2 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar2 <- plsreg2(dataAstar2[,4:9],dataAstar2[,1:3],comps=5) resAstar2$Q2 resAstar2$Q2[,4]>0.0975 dimX <- 6 Astar <- 3 (dataAstar3 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar3 <- plsreg2(dataAstar3[,4:9],dataAstar3[,1:3],comps=5) resAstar3$Q2 resAstar3$Q2[,4]>0.0975 dimX <- 6 Astar <- 4 (dataAstar4 <- t(replicate(50,simul_data_YX(dimX,Astar)))) library(plsdepot) resAstar4 <- plsreg2(dataAstar4[,5:10],dataAstar4[,1:4],comps=5) resAstar4$Q2 resAstar4$Q2[,5]>0.0975 rm(list=c("dimX","Astar")) }
This function provides a summary method for the class
"cv.plsRglmmodel"
## S3 method for class 'cv.plsRglmmodel' summary(object, ...)## S3 method for class 'cv.plsRglmmodel' summary(object, ...)
object |
an object of the class |
... |
further arguments to be passed to or from methods. |
An object of class "summary.cv.plsRmodel" if model is
missing or model="pls". Otherwise an object of class
"summary.cv.plsRglmmodel".
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] summary(cv.plsRglm(Y~.,data=Cornell,nt=10,NK=1, modele="pls-glm-family",family=gaussian(), verbose=FALSE)) rm(list=c("XCornell","yCornell","bbb"))
This function provides a summary method for the class "cv.plsRmodel"
## S3 method for class 'cv.plsRmodel' summary(object, ...)## S3 method for class 'cv.plsRmodel' summary(object, ...)
object |
an object of the class |
... |
further arguments to be passed to or from methods. |
An object of class "summary.cv.plsRglmmodel".
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) summary(cv.plsR(Y~.,data=Cornell,nt=10,K=6, verbose=FALSE), verbose=FALSE)data(Cornell) summary(cv.plsR(Y~.,data=Cornell,nt=10,K=6, verbose=FALSE), verbose=FALSE)
Summarizes repeated k-fold cross-validation results from
cv.plsRmulti.
## S3 method for class 'cv.plsRmultiModel' summary(object, verbose = TRUE, ...)## S3 method for class 'cv.plsRmultiModel' summary(object, verbose = TRUE, ...)
object |
An object of class |
verbose |
Should progress information be displayed? |
... |
Further arguments passed to methods. |
The returned object inherits from "summary.cv.plsRmodel" so that
cvtable and the existing plot method can be reused for the
aggregated multivariate criteria.
A list of per-partition summary matrices with the same aggregate columns used
by summary.cv.plsRmodel for Q2, PRESS, and
RSS, plus response-specific PRESS, RSS, Q2, and
R2 columns.
set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) cv_fit <- cv.plsRmulti(Y, X, nt = 2, K = 3, NK = 1, verbose = FALSE) summary(cv_fit, verbose = FALSE)set.seed(123) X <- matrix(rnorm(60 * 4), ncol = 4) Y <- cbind( y1 = X[, 1] - 0.5 * X[, 2] + rnorm(60, sd = 0.1), y2 = 0.3 * X[, 2] + X[, 3] + rnorm(60, sd = 0.1) ) cv_fit <- cv.plsRmulti(Y, X, nt = 2, K = 3, NK = 1, verbose = FALSE) summary(cv_fit, verbose = FALSE)
This function provides a summary method for the class "plsRglmmodel"
## S3 method for class 'plsRglmmodel' summary(object, ...)## S3 method for class 'plsRglmmodel' summary(object, ...)
object |
an object of the class |
... |
further arguments to be passed to or from methods. |
call |
function call of plsRglmmodel |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) summary(modplsglm) rm(list=c("XCornell","yCornell","modplsglm"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modplsglm <- plsRglm(yCornell,XCornell,3,modele="pls-glm-gaussian") class(modplsglm) summary(modplsglm) rm(list=c("XCornell","yCornell","modplsglm"))
This function provides a summary method for the class "plsRmodel"
## S3 method for class 'plsRmodel' summary(object, ...)## S3 method for class 'plsRmodel' summary(object, ...)
object |
an object of the class |
... |
further arguments to be passed to or from methods. |
call |
function call of plsRmodel |
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
Nicolas Meyer, Myriam Maumy-Bertrand et Frédéric Bertrand (2010). Comparing the linear and the logistic PLS regression with qualitative predictors: application to allelotyping data. Journal de la Societe Francaise de Statistique, 151(2), pages 1-18. https://www.numdam.org/item/JSFS_2010__151_2_1_0/
data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,3,modele="pls") class(modpls) summary(modpls) rm(list=c("XCornell","yCornell","modpls"))data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] modpls <- plsR(yCornell,XCornell,3,modele="pls") class(modpls) summary(modpls) rm(list=c("XCornell","yCornell","modpls"))
Provides a wrapper for the bootstrap function tilt.boot from the
boot R package.
Implements non-parametric tilted bootstrap for PLS
regression models by case resampling : the tilt.boot function will
run an initial bootstrap with equal resampling probabilities (if required)
and will use the output of the initial run to find resampling probabilities
which put the value of the statistic at required values. It then runs an
importance resampling bootstrap using the calculated probabilities as the
resampling distribution.
tilt.bootpls( object, typeboot = "plsmodel", statistic = coefs.plsR, R = c(499, 250, 250), alpha = c(0.025, 0.975), sim = "ordinary", stype = "i", index = 1, stabvalue = 1e+06, ... )tilt.bootpls( object, typeboot = "plsmodel", statistic = coefs.plsR, R = c(499, 250, 250), alpha = c(0.025, 0.975), sim = "ordinary", stype = "i", index = 1, stabvalue = 1e+06, ... )
object |
An object of class |
typeboot |
The type of bootstrap. Either (Y,X) boostrap
( |
statistic |
A function which when applied to data returns a vector
containing the statistic(s) of interest. |
R |
The number of bootstrap replicates. Usually this will be a single
positive integer. For importance resampling, some resamples may use one set
of weights and others use a different set of weights. In this case |
alpha |
The alpha level to which tilting is required. This parameter is
ignored if |
sim |
A character string indicating the type of simulation required.
Possible values are |
stype |
A character string indicating what the second argument of
|
index |
The index of the statistic of interest in the output from
|
stabvalue |
Upper bound for the absolute value of the coefficients. |
... |
ny further arguments can be passed to |
An object of class "boot".
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
## Not run: data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] set.seed(1385) Cornell.tilt.boot <- tilt.bootpls(plsR(yCornell,XCornell,1), statistic=coefs.plsR, typeboot="fmodel_np", R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) Cornell.tilt.boot str(Cornell.tilt.boot) boxplots.bootpls(Cornell.tilt.boot,indices=2:7) rm(Cornell.tilt.boot) ## End(Not run)## Not run: data(Cornell) XCornell<-Cornell[,1:7] yCornell<-Cornell[,8] set.seed(1385) Cornell.tilt.boot <- tilt.bootpls(plsR(yCornell,XCornell,1), statistic=coefs.plsR, typeboot="fmodel_np", R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) Cornell.tilt.boot str(Cornell.tilt.boot) boxplots.bootpls(Cornell.tilt.boot,indices=2:7) rm(Cornell.tilt.boot) ## End(Not run)
Provides a wrapper for the bootstrap function tilt.boot from the
boot R package.
Implements non-parametric tilted bootstrap for PLS
generalized linear regression models by case resampling : the
tilt.boot function will run an initial bootstrap with equal
resampling probabilities (if required) and will use the output of the
initial run to find resampling probabilities which put the value of the
statistic at required values. It then runs an importance resampling
bootstrap using the calculated probabilities as the resampling distribution.
tilt.bootplsglm( object, typeboot = "fmodel_np", statistic = coefs.plsRglm, R = c(499, 250, 250), alpha = c(0.025, 0.975), sim = "ordinary", stype = "i", index = 1, stabvalue = 1e+06, ... )tilt.bootplsglm( object, typeboot = "fmodel_np", statistic = coefs.plsRglm, R = c(499, 250, 250), alpha = c(0.025, 0.975), sim = "ordinary", stype = "i", index = 1, stabvalue = 1e+06, ... )
object |
An object of class |
typeboot |
The type of bootstrap. Either (Y,X) boostrap
( |
statistic |
A function which when applied to data returns a vector
containing the statistic(s) of interest. |
R |
The number of bootstrap replicates. Usually this will be a single
positive integer. For importance resampling, some resamples may use one set
of weights and others use a different set of weights. In this case |
alpha |
The alpha level to which tilting is required. This parameter is
ignored if |
sim |
A character string indicating the type of simulation required.
Possible values are |
stype |
A character string indicating what the second argument of
|
index |
The index of the statistic of interest in the output from
|
stabvalue |
Upper bound for the absolute value of the coefficients. |
... |
ny further arguments can be passed to |
An object of class "boot".
Frédéric Bertrand
[email protected]
https://fbertran.github.io/homepage/
data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y dataset <- cbind(y=yaze_compl,Xaze_compl) # Lazraq-Cleroux PLS bootstrap Classic aze_compl.tilt.boot <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic", family=NULL), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot,1:2) aze_compl.tilt.boot2 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic"), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot2,1:2) aze_compl.tilt.boot3 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-family", family=binomial), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot3,1:2) # PLS bootstrap balanced aze_compl.tilt.boot4 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic"), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="balanced", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot4,1:2)data(aze_compl) Xaze_compl<-aze_compl[,2:34] yaze_compl<-aze_compl$y dataset <- cbind(y=yaze_compl,Xaze_compl) # Lazraq-Cleroux PLS bootstrap Classic aze_compl.tilt.boot <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic", family=NULL), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot,1:2) aze_compl.tilt.boot2 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic"), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot2,1:2) aze_compl.tilt.boot3 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-family", family=binomial), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="ordinary", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot3,1:2) # PLS bootstrap balanced aze_compl.tilt.boot4 <- tilt.bootplsglm(plsRglm(yaze_compl,Xaze_compl,3, modele="pls-glm-logistic"), statistic=coefs.plsRglm, R=c(499, 100, 100), alpha=c(0.025, 0.975), sim="balanced", stype="i", index=1) boxplots.bootpls(aze_compl.tilt.boot4,1:2)
Quality of Bordeaux wines (Quality) and four potentially predictive
variables (Temperature, Sunshine, Heat and
Rain).
The value of Temperature for the first observation was
remove from the matrix of predictors on purpose.
A data frame with 34 observations on the following 4 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
P. Bastien, V. Esposito-Vinzi, and M. Tenenhaus. (2005). PLS generalised linear regression. Computational Statistics & Data Analysis, 48(1):17-46.
M. Tenenhaus. (2005). La regression logistique PLS. In J.-J. Droesbeke, M. Lejeune, and G. Saporta, editors, Modeles statistiques pour donnees qualitatives. Editions Technip, Paris.
data(XbordeauxNA) str(XbordeauxNA)data(XbordeauxNA) str(XbordeauxNA)
The caterpillar dataset was extracted from a 1973 study on pine
processionary caterpillars. It assesses the influence of some forest
settlement characteristics on the development of caterpillar colonies. There
are k=10 potentially explanatory variables defined on n=33 areas.
The
value of x2 for the first observation was remove from the matrix of
predictors on purpose.
A data frame with 33 observations on the following 10 variables and one missing value.
altitude (in meters)
slope (en degrees)
number of pines in the area
height (in meters) of the tree sampled at the center of the area
diameter (in meters) of the tree sampled at the center of the area
index of the settlement density
orientation of the area (from 1 if southbound to 2 otherwise)
height (in meters) of the dominant tree
number of vegetation strata
mix settlement index (from 1 if not mixed to 2 if mixed)
These caterpillars got their names from their habit of moving over the
ground in incredibly long head-to-tail processions when leaving their nest
to create a new colony.
The XpineNAX21 is a dataset with a missing
value for testing purpose.
Tomassone R., Audrain S., Lesquoy-de Turckeim E., Millier C. (1992). “La régression, nouveaux regards sur une ancienne méthode statistique”, INRA, Actualités Scientifiques et Agronomiques, Masson, Paris.
data(XpineNAX21) str(XpineNAX21)data(XpineNAX21) str(XpineNAX21)